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Abstract 

We study discrete-time quantum computation from a theoretical perspective. 

We describe some frameworks for universal quantum computation with limited 
control — namely the one-dimensional cellular automaton, and the qubit spin-chain 
with all access limited to one end of the chain — and we obtain efficient constructions 
for them. These two examples help show how little control is necessary to make a 
universal framework, provided that the gates which are implemented are noiseless. 
It is hoped that the latter example might help motivate research into novel solid- 
state computing platforms that are almost totally isolated from couplings to the 
environment. 

Recalling concepts from the theory of algorithmic complexity and quantum circuits, 
we show some positive results about the computational power of the so-called "one 
clean qubit" model and its ensuing naturally defined bounded-probability polytime 
complexity class, showing that it contains the class ©L and giving oracles relative 
to which it is incomparable with P. 

We study quantum computing models based on the Fourier Hierarchy, which is a 
conceptually straightforward way of regarding quantum computation as a direct 
extension of classical computation. The related concept of the Fourier Sampling 
oracle provides a subtly different perspective on the same mathematical construc- 
tions, again establishing the centrality of the Hadamard transform as one way of 
extending classical ideas to quantum ones. We examine quantum algorithms that 
naturally employ these concepts, recasting some well-known number-theoretic algo- 
rithms into these models. In particular, a detailed example is given of how, using 
only gates that would preserve the computational basis, it is possible to render a 
version of Shor's algorithm where initialisation and readout are performed in the 
Hadamard basis. 

In a similar vein, we study models based on the Clifford-Diagonal Hierarchy, and 
introduce the XQP oracle, illustrating that temporal complexity need not be nec- 
essary for some notions of quantum complexity. We examine simple protocols that 
arise from these notions, in particular providing an example of a protocol which it is 
hoped could be of significant use in testing quantum computers that are rather lim- 
ited in terms of computational depth. We also provide some analysis of the classical 
techniques that approximate the signalling required within such protocols, arguing 
that some specifically 'quantum' complexity can appear in the absence of temporal 
structure. 
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Preface 



0.1 Overview 

Quantum algorithmics became widely recognised as a subject in its own right after 
the publication in 1994 of Shor's Algorithm [64] and later Grover's Algorithm [31] 
in 1996, these in turn having been inspired by the Deutsch-Jozsa algorithm of 1992. 
These ideas underpin algorithmic primitives that illustrate a superiority of quantum 
information processing over classical information processing for solving certain prob- 
lems whose statements and solutions are definable in purely classical terms, that is, 
without reference to the theory of Quantum Information Processing. 

This subject relates to many other disciplines, and as such enables many different 
valid approaches to be made to the potential 'real-world solution' of such problems. 
We employ a mathematical methodology (based on complexity theory) to formulate 
some new algorithmic constructions. It is worthwhile briefly exploring some of the 
more philosophical issues surrounding the subject of quantum information and al- 
gorithmics before engaging with the mathematics. Thus Chapter 1 is written in a 
non-rigorous style, freely borrowing notions from a range of authors, simply to put 
in place a few of the concepts that will be referred to in the more formally written, 
mathematically oriented, later sections. 

Following that, our goal is to cast some quantum algorithms into particular struc- 
tures or frameworks that reflect some kind of physical limitation. The motivation 
for doing this is to obtain new insight into such questions as 

• Which physical limitations do not significantly inhibit quantum computation? 
What is the 'simplest' architecture for a quantum computer? (Chapter 2.) 

• How useful are mixed states in quantum computing? (Chapter 3.) 

• Which physical limitations enable a ready comparison with classical compu- 



tation? Which quantum algorithms are 'close' to being classical? What is the 
'simplest' quantum subroutine? (Chapter 4.) 

• Which physical limitations correspond to natural structures within the under- 
lying mathematics? What is the 'simplest' fundamentally quantum protocol? 
(Chapter 5.) 

The answers we give to these questions are by no means complete, unconditional, 
or uncontroversial; rather, they present a particular style and particular ways of 
thinking about quantum complexity that may be useful in the development of new 
algorithms or in the future design of quantum computing architectures. 
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0.2 Previous publications 

Much of the content of this dissertation has been published previously, and some of 
it is joint work. 

Chapter 2 

The first part is based on my paper [61], "Universally programmable quantum cel- 
lular automaton". This was joint work with Torsten Franz and Reinhard Werner. It 
was the first paper to give an explicit construction for a 'universal' one-dimensional 
cellular automaton, in the physically-motivated sense introduced by Schumacher and 
Werner (2004). 

Chapter 3 

The example of the "One pure qubit" model contains material from an unpublished 
paper of mine that is available from the quant-ph archive, quant-ph/0608132. 

Chapter 4 

This draws heavily from my paper [59], "On the role of Hadamard gates in quantum 
circuits". The theme of the 'Fourier Hierarchy', introduced by Yaoyun Shi (2003) 
and used in that paper, is developed further in this dissertation. Following [14], the 
term Fourier Sampling Oracle is used here. 

Chapter 5 

This is based on my paper [60] , "Temporally unstructured quantum computation" ; 
a joint work with Michael Bremner. In that paper, we introduced a new quantum 
protocol using the 'IQP model', on which basis the term IQP Oracle is defined. The 
work is recounted in this dissertation, with a slightly different emphasis. 
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0.3 Notations 

Here we collect together some of the notations used in the dissertation that are 
perhaps not standard. 

Pauli matrices are denoted X, Y and Z. These often occur as unitary transforms, 
but they are also Hermitian operators which may be used to define projectors. For 
example, -4^— is a projector from a two-dimensional space to a one-dimensional 
space, written in the Dirac notation as |0)(0|. 



X:=\ , Y:=\ , Z: 






Subscripts on such symbols are used to indicate which qubits they pertain to. For 
example, X a can refer to a unitary transform that 'flips' bit a according to |0) f-> 
|1) by applying X to the qubit labelled by a, or alternatively X a can refer to an 
Hermitian operator that 'observes' the qubit labelled by a. The Hadamard operator 
(or matrix) is given by 

H := (X + Z)/V2. 

Many of the unitary operators we employ are 'controlled gates', whose matrix rep- 
resentation with respect to the computational basis takes the form 




Such a matrix may be written A(U), where the A symbol denotes 'control' (infor- 
mally, "apply gate U to some qudit conditioned on some other control qubit being 
set"). When there are multiple controls, we use a superscript to count them, so 
A 2 (X) = A(A(X)) for example would denote the Toffoli gate. Superscripts on uni- 
taries (e.g. U 2 ) denote sequential application of a unitary transform to a qudit, 
which is the same concept as raising to a power, algebraically. Where parallel ap- 
plication is intended, we write U® 2 to mean U applied on both of two qudits in 
parallel. Again, subscripts are generally used to indicate which qudits are acted 
to be on by U and which qubits are 'controls' for the gate (the ordering of control 
qubits is immaterial). Ranges of qudits may be specified for large unitaries, e.g. 
Afi 4i(£q"5..7l) would denote applying a three-qudit unitary U across sites numbered 
5 through 7, conditional on the four qubits in sites 1 through 4 being in the state 
|1111). 
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Chapter 1 



Approach to Complexity 



This chapter presents a little background to quantum complexity : it is by no means 
a complete introduction. It can readily be skipped by the reader already familiar 
with the field. 



1.1 Ontology 

There is famously much diversity in describing what various quantum statements 
might really be saying about the world. Bearing in mind the basic notions of statis- 
tical mechanics and of Everett's interpretation (see [25]), we will begin by providing 
a brief sketch of how one might choose to understand the ontology of quantum pro- 
cesses, with the hope that this may help provide clarity for some of the phraseology 
used later in this dissertation. (Our goal in this opening section is to 'tell one story 
about reality', rather than contrast the various options.) 

1.1.1 A model for dynamics 

Let TIME be modelled as a real number line, parameterised by t. Consider a state 
vector ^ = *k(t), a mathematical function of TIME, whose role is to encapsulate 
the total description of all that may be said about a system; that is, a complete 
objective description of a system. One may wish to ask questions about how the 
system evolves with time, and this line of thinking we refer to as DYNAMICS. 

W = — . 

dt 

There is an underlying anticipation that this model should provide a way of ap- 
proximating reality, using smooth functions for the state vector. Of course there 
are plenty of reasons to think that the model is far too naive to capture anything 
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like the 'whole' of physics, not least because the model for TIME here is entirely 
non-relativistic. These concerns aside, we see that the immediate ontological prob- 
lem intrinsic to the model arises from the notion of linearity. That is, since (real) 
analysis makes it clear that the derivative operator is linear, we have it axiomatically 
that 

(¥ + $)' = V + $', 

whatever ^ and <£ might really be; but the model so far says nothing about the 
meaning of the + signs on either side of this equation. Put another way, there is 
nothing mysterious about why DYNAMICS should be linear, but linearity itself does 
not come automatically equipped with a physical (semantic) interpretation. 

If the role of \l/ is taken to be the encapsulation of all that describes a system at 
a given time, and if <3? is supposed to be of the same category, then we see that 
these symbols are indicating potential different possible configurations of the same 
system. Then it is most natural to infer that Vf + $ might encode a state of the 
system 'being ^ and/or also being <3?', and so we might associate linearity with the 
intuitive (yet quantitative) notion of probability. 

The underlying philosophical notions of Probability Theory are notoriously difficult 
to define rigrously (of. [30]). With the 'Frequentists', we could attempt to reject all 
notions of probability that do not ultimately depend upon the counting of ontolog- 
ically real conditions or events. Alternatively, with the 'Bayesians', we could adopt 
Probability Theory as a means of describing relationships between prior and pos- 
terior subjective states, according to experimental data. A problem with the first 
approach is that it is not especially powerful, since it is limited in scope to cases 
where there is something definite to count. A problem with the second approach is 
that it is hard to find a meaning for a prior distribution, and even then, the resulting 
posterior distribution depends heavily on the choice of experiments made and data 
collected. Instead, we could choose to overlook the precise meaning of probability 
for the time being, identifying it simply as propensity in the modern sense (c/. [7]), 
and simply ask about the possible forms of solutions to the equation above. 

1.1.2 Classical computation 

For a classical model, one would take Vl/ and <I> to be stochastic vectors (assumed 
finite-dimensional for this discussion) and, on discretising TIME, take the set of 
allowable linear transformations to be the stochastic linear maps. Then the allowable 
linear combinations would be the convex ones, and the interpretation of the state 
vectors themselves would be as belonging to a vector space having a basis that 
constitutes the possible 'actual' (i.e. objective) configurations of the system; we'll 
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call this the computational basis. The convex combination of 'actual' states then 
merely denotes a probabilistic mix (a derived concept subject to whatever we later 
decide 'probability' means). 

The dynamics of this kind of computation come with no guarantee of time symmetry, 
since many stochastic maps do not possess an inverse. Thus there is the possibility 
of 'computational heating' of a state vector (increasing Shannon entropy, perhaps 
by setting a bit of memory to be random, for example) or 'computational cooling' 
of a state vector (e.g. perhaps by resetting a bit of memory to 0). This provides 
something of a backdrop for classical randomised computation. Indeed, the classical 
theory of Turing machines requires little more than this for an ontological framework. 
(By restricting to the rational field instead of the real field, we can even make a purely 
Frequentist interpretation, because then one can normalise the vectors onto the 
integer lattice and speak reasonably unambiguously about counting computational 
paths, interpreting probabilities in the ordinary fashion.) 

There is also a notion of Reduction, which is a subjective operation to be applied on 
a state vector, to reduce it (stochastically) to a computational basis vector within 
its support, that is, to choose to 'realise' one of the possibilities for the state. This 
operation is subjective not least because it is non-linear (and therefore cannot be part 
of DYNAMICS), but also because its meaning depends on what we decide probability 
really is. One can think of Reduction (also called 'state collapse' in the quantum 
world) as a spontaneous change in the scope of what is actually being modelled, 
rather than a change or evolution of objective state itself, as when, for example, one 
chooses to consider a single possibility instead of considering many possibilities at the 
same time. Yet the most important aspect of the classical model — as contrasted with 
the quantum alternative — is that this subjective Reduction effectively commutes 
with the objective DYNAMICS of the model. In symbols, if R denotes the subjective 
stochastic choosing of a computational basis vector, and S denotes any objective 
stochastic transformation, then 

R(S-V) = R(S-R(V)). 

The implication of this is that it makes no difference to the meaning of the compu- 
tation how we understand R, because the action of R can always be pushed through 
to the end of the computational procedure, and therefore effectively ignored. This 
seems to be commensurate with the common understanding of what randomness 
really is, i.e. a purely subjective uncertainty that can be effectively ignored until 
required. The relevant maxim here is, "Classical computation paths do not inter- 
fere." 
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1.1.3 Quantum computation 

The quantum model, as we describe it, takes a different approach. For reasons of 
quantification, it is still appropriate to think of vector spaces with a metric. This 
time, we expect Vl/ and <& to be normalised vectors, i.e. having unit Euclidean 
length. The set of transformations that preserves this property is constituted by 
the orthogonal group, or more generally the unitary group. The symbol U denotes 
for us an arbitrary unitary transform, and replaces the stochastic transform S of 
classical dynamics. Now there is time-symmetry, in the sense that, being a group, 
every transformation possesses a valid inverse. To ensure that the Lie algebra is 
algebraically closed, we may as well take the underlying field to be complex, in 
which case, the appropriate kind of (positive-definite) metric is the Hermitian inner 
product. With respect to this inner product, the computational basis is taken to be 
orthonormal. The inner product between an 'object' vector and a 'reference' unit 
vector is called an amplitude (and we speak of the amplitude of the object in the 
direction of the reference) . 

The 'specialness' of the computational basis is no longer geometrically significant : 
without reference to a specific physical model, there is much symmetry within the 
Lie group of transformations, and so no particular reason to prefer one orthonormal 
basis over another. And so, within a closed system, there is no notion of 'heating' 
or 'cooling', because the (von Neumann) entropy of a state vector is always zero. 
The full unitary group acts transitively on the projective space, and so no one state 
is intrinsically different from any other. Notions of entropy, entanglement, and 
mixedness, do not properly arise until we consider dividing a system into two parts, 
or consider the meta-system of 'system-plus-environment'. 

By taking a tensor decomposition of a finite-dimensional system into two (or more) 
parts, it is well understood that notions of entaglement (Everett's "relative states", 
[25]) are possible. Furthermore, by imposing limits on the allowable dynamics across 
the two parts, and by considering one's computation to take place on the smaller 
of the two parts, it is possible to recreate all of the features of the classical model 
within the smaller component, including heating, cooling, and irreversibility. This 
phenomenon of decoherence 1 goes some way to justify the claim that classical me- 
chanics is a 'subset' of the more complete quantum mechanics. Decoherence is both 
necessary for quantum computation (for example to enable a system to be cooled 
into an initial starting state), and yet also problematic (because it can inhibit the 
'quantum' features of the computation, if not precisely handled). 

The notion of Reduction for a quantum system turns out to be closely related to 

lr The term decoherence is used a little differently in so-called 'non-Everettian' interpretations of 
quantum mechanics. 
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the notion of decoherence. Everett derives the Born rule from simple considerations 
about normalisation; this being the only stochastic measure valid for all states. This 
rule tells us that the action of Reduction on a quantum state consists in the stochastic 
choice of a computational basis vector according to the square of the modulus of the 
amplitude of that state in the direction of the choice. 

fl(tf) -> bj w.p. \(^,bj)\\ 

Everett writes [25], "In other words, pure [unitary] wave mechanics, without any 
initial probability assertions, leads to all the probability concepts of the familiar 
formalism." As with the classical case, this Reduction operation is not linear, and 
so not a part of DYNAMICS, i.e. not to be taken as intrinsically objective. The 
sometimes counterintuitive aspect of quantum information processing could then be 
said to derive from the fact that this Reduction does not commute with the set of 
allowable transformations, in sharp contrast to the more familiar classical theory. 
This means that one is not at liberty to apply this simplifying Reduction at arbitrary 
stages of processing. Indeed, a priori one is not at liberty to apply this simplifying 
Reduction at all, without a clear justification. 

The justification we need, at least for making sense of ideas within the field of compu- 
tation and algorithmics, comes from measurement. Besides initial input preparation, 
measurement is the main place where decoherence becomes an essential physical fea- 
ture of the description of quantum information processing. Complete measurement 
describes the action of choosing an orthonormal basis for a target system and then 
applying an entangling operation between that system, with respect to that basis, 
and a suitably prepared external system (a measurement apparatus), before sepa- 
rating the two systems to prevent further interaction. The entangling operation is 
simply a unitary map that will have the effect of simulating decoherence of the tar- 
get system in the required basis when it is subsequently considered separately from 
the measurement apparatus. To be precise, measurement does more than introduce 
decoherence, because it also records between the target system and the measure- 
ment device a quantum correlation pertaining to the information which has been 
ipso facto measured. 

The important point here is that measurement supervenes Reduction, so that (in 
symbols) if M denotes measurement (with respect to an unrepresented measurement 
apparatus) and R denotes subjective Reduction, then 

M ■ Rip) = M - * = R(M ■ #). 

Note that M is objective; it is perfectly linear (indeed unitary) on the space of 
the system tensored with the measurement apparatus, though it acts non-linearly if 
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considered on the target system alone. Thus, we use external measuring systems — 
and the decoherence they bring — in order to give a physical and operational meaning 
to the otherwise vague notion of Reduction. When measurement happens, states 
'collapse', whether we like it or not. 

If a quantum information process begins with a suitably decoherent initialisation 
procedure (such as the preparation of an array of qubits into separate unentangled 
computational basis states) and ends with a (complete) measurement, then it makes 
perfect sense to regard the Reduction process R as happening at the beginning and 
end of the computation, where the system is 'apparently classical'. But since R and 
U do not necessarily commute, we must avoid imposing R at intermediate points 
within the computation, in between unitary dynamics. Pragmatically this means 
that we must avoid 'accidental measurement', or indeed any undesired decoherence 
of 'important' data, throughout the lifetime of the computational process. This is 
what serves to distinguish a quantum computer from a classical one. (See the lecture 
transcripts at [2] for a gentle — yet remarkably effective — introduction to this kind 
of abstract approach.) 

This provides enough of an ontological framework for the definition and analysis 
of quantum Turing machines and the various other conceptual devices one comes 
across in the theory of quantum computational complexity theory, without direct 
recourse to the physics of quantum mechanics itself. 

1.1.4 Refining the model 

Dimensions 

For convenience, we have been restricting attention to finite dimensional vector 
spaces, and will continue to do so for studying algorithmics, for the most part. On 
the few occasions where infinite dimensional vector spaces are more appropriate, a 
sufficient mathematical treatment will be given. 

Time 

As well as thinking of TIME as a real line, and DYNAMICS as proceeding via the 
unitary Lie algebra of Hamiltonian actions on the vector space, we have also found 
it often convenient to discretise TIME, working with (a countable subgroup of) the 
Lie group of unitary gates as though they were 'atomic' transformations. The study 
of algorithmics uses both notions, continuous and discrete, usually depending upon 
assumptions about the underlying physical architecture. 
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None of our treatments uses a generally covariant treatment of TIME as an aspect 
of SPACETIME, since relativistic effects are considered unlikely to be of significant 
philosophical relevance to complexity theories underpinned by a pragmatic control 
theory, and such notions require a distinctly deeper ontology to make sense. See e.g. 
[50] for a thoroughgoing guide to the geometric principles involved in 'quantizing 
gravity'. 

Mostly we shall prefer the discrete picture for TIME, since it is well adapted to 
discussing both classical and quantum models, whereas the continuous picture does 
not apply so well in the classical case. In fact, this goes some way to illustrate how 
the classical model is simply not 'native' to the set of assumptions that we began 
with in §1.1.1. In the discrete picture, an 'atomic' dynamic component or evolution 
is called a gate, for both classical and quantum computing. 

Computational paths 

It is usual within the study of classical algorithms to speak of computational paths, 
as alluded to previously. Whenever a discrete time model is employed, one may 
understand a computational path, in a counterfactual sense, to be the series of states 
that would be followed by the DYNAMICS of the computation were Reductions 
(with respect to the computational basis) to be made before and after each gate. 
(One sometimes speaks of the Universe "splitting into many worlds" in this context, 
though this is really an artefact of the subjective Reduction process.) For quantum 
algorithmics, the non-commutativity of gates with Reductions is tantamount to the 
maxim, "Quantum computational paths can interfere." 

For quantum computing, it is more appropriate to regard a computational path as 
tracking not only the computational basis vector 'realised' (counterfactually) at each 
point in (discrete) time, but also the amplitude that the objective state vector holds 
in that direction at that time : in general, both of these kinds of information are 
relevant to a computational process. The square of the modulus of the amplitude 
then provides the path with its own 'weight', and there is also a phase, which is 
the argument of the amplitude. Significantly, phases may be negative as well as 
positive (and if we use an algebraically closed field, we may take them to be complex 
also). Then the 'interference' between computational paths derives precisely from 
the fact that when a set of paths is regarded together for Reduction (i.e. when 
a measurement is made), it is the linear combination of paths terminating with 
the same computational basis vector that determines the probabilities relevant to 
the stochastic choice of an 'output'. In other words, we consider that there is 
no canonical 'actual history' to a particular computation : nothing of the sort, 
"This is the path which the computation took." Rather, two paths with the same 
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final computational basis state will constructively interfere or destructively interfere 
according as to whether they have the same phase (are 'in phase') or have opposite 
phases (are 'out of phase'). 

Thus, computational paths have a distinctly more subjective flavour within a quan- 
tum process than within a classical one, arising from the fact that the computational 
basis is arbitrary within a quantum process, rather than a part of the objective 
description within a classical one. Nonetheless, the concept has remained firmly 
entrenched within the conceptual framework of quantum algorithmics, and has its 
uses within some of the non-physical definitions in the field of algorithmic complex- 
ity. 

1.2 Complexity 

What counts as (quantum) information, and how do we decide whether the process- 
ing that it has been subject to is 'quantum' ? To date, no truly convincing quantum 
computer of significant computational power has been presented, but many phys- 
ical experiments have shed light on what quantum information processing might 
mean. 

The popular method for giving quantitative rigour to the various notions of quantum 
information processing involves asymptotic computational complexity analysis, which 
involves finding upper- and lower bounds on the resource requirements of certain 
algorithmic tasks (usually classically defined) in the asymptotic limit of arbitrarily 
large problem instances, when certain constraints apply. Resource requirements can 
include a range of parameterisable constraints, most notably TIME and SPACE, in 
some sense. It is appropriate that conditions for algorithmic tasks may also include 
certain non-physical constraints, such as quantified bounds on success probability, 
or costed (and well-defined) oracular access to particularly relevant mathematical 
functions. Much of the literature on quantum algorithmic complexity derives from 
similar notions and results from the classical theory of algorithmic complexity, and 
many of the notions carry over ('quantize') very naturally. 

In this section, we briefly recall some of the various different elements and notions 
that will be useful in the forthcoming discussion. In particular, some mention is 
made of the Turing machine model and the circuit model, as these concepts are 
referred to throughout the dissertation; but nowhere do we use the random access 
memory model, this latter being more relevant to the kinds of highly complex 'large- 
SPACE' algorithms that are not the subject of this study. For more background on 
the classical concepts, we recommend reference to [49], and for the quantum ones, 
see [47]. 
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1.2.1 Architectures 

Many architectures have been proposed for the construction of a quantum computer. 
The earliest algorithms were considered in a model based on networks of small uni- 
tary gates, but recent years have seen ideas like the one-way quantum computer in 
which the non-unitary acts of measurement play a key role for data processing, or 
adiabatic computing in which continuous time dynamics are used. Studies showing 
how different quantum computational models can simulate each other are valuable in 
constructing universal paradigms. They also provide perhaps the clearest expression 
of the primitives in each computational model that are responsible for generating 
computational power seemingly 'stronger' than that of classical computation. Since 
the major obstacles against useful quantum computation are considered likely (for 
a long time) to be engineering difficulties in implementation, a further incentive for 
such alternatives in underlying architecture is to generate ideas of how to adapt 
the computational model to various different 'limited' sets of primitives. This the- 
sis investigates some particular paradigms for architectures for quantum computing 
devices, exploring both universal computation and limited computation. The em- 
phasis is always on understanding how a particular limitation or restriction of some 
aspect of the computational process can (or does not) inhibit some particular kind 
of operational algorithmic process. 

1.2.2 Computational tasks 

Decision languages 

Usually it is possible to examine much about the computational power of some 
computing paradigm by asking about the complexity classes of decision languages 
associated to it. In simple terms, a decision language is just a subset of some 'simple' 
countably infinite set (usually the positive integers or the finite-length bitstrings) 
that can be 'decided' by some operational (or more fanciful) means. Note that the 
theory of computational complexity — dealing with the resources needed to address 
a computational task — differs from the theory of recursion — dealing with whether a 
task would be 'possible' if resources were unconstrained. Within the former theory, 
we will always have in mind some pragmatic limit on some resource, such that the 
possibility of a machine's never halting is of absolutely no consequence. 

By way of example, we recall a few common complexity classes (c/. [49]) : 

• C € P if there exists a (deterministic) machine that accepts x within poly- 
nomial time {i.e. size(x) ' 1 )) whenever ie£, but which rejects those x not 
in C Informally, P is often considered to be the class of languages "efficiently 
decided" . 
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C £ L if there exists a (deterministic) machine that accepts x within logarith- 
mic space (0(log(size(a;)))) whenever x £ C, but which rejects those x not 
in C {Space here refers to the amount of computational storage/workspace 
required by the machine, not the space required to submit the actual input x.) 

C £ ©L if there exists a (randomized, classical) machine that accepts x within 
logarithmic space (0(log(size(x)))) on an odd number of computational 
paths whenever x £ C, but which accepts x on an even number of paths 
when it is not in C. (Computational paths here are required to be all of equal 
length for a given input string x.) 

C £ NP if and only if there exists a nondeterministic machine that accepts 
x with some non-zero probability, within polynomial time, whenever x £ 
C. This notion is given an operational meaning of sorts (and unambiguouly 
generalised in other contexts) by observing that it is equivalent to saying that 
for some other language C! £ P, the item x lies in C if and only if there is some 
w such that the concatenation item (x, w) lies in C (and w is then called the 
witness to that fact). To ensure that reductions compose, size(w) will need to 
be polynomially bounded in size(x). Informally, NP is often considered to be 
the class of languages "efficiently verified" . 

C £ PP if there is a probabilistic machine that accepts x with probability 
strictly greater than ^, within polynomial time, whenever x £ C, &c. This 
class is again syntactic in the sense that to specify a well-formed probabilistic 
machine is to specify a PP decision language; but it is not operational in the 
sense that there is no particular way to make use of that machine to form an 
actual real-world decision, because the probabilities in question might turn out 
to be exponentially close to the \ threshold. 

C £ BPP if there is a probabilistic machine that accepts x with probability 
at least |, within polynomial time, whenever x £ C, but rejects x with prob- 
ability at least | (assuming a polynomial time bound), whenever x C. This 
class is called semantic (as opposed to syntactic) because its definition does 
not make clear exactly when an arbitrary machine might happen to display 
the required probability bounds consistently, for all x. For example, the 
existence of even a single x with an acceptance probability strictly between 
| and | would prevent the machine in question from issuing a BPP decision 
language under this definition. But the class BPP is nonetheless operational 
in flavour, because by parallel or sequential repetition of the computation, 
when the probabilities are promised to be bounded away from | as described, 
the threshold value of | can be boosted to lie exponentially close to unity, still 
all within polynomial time, at which point it becomes pragmatically beyond 
doubt whether or not x lies within C. Cf. §3.1. 
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C € BQP if there is a quantum machine for C that accepts or rejects x 
with the same -g versus | probability bounds as for BPP, again running in 
polynomial time. Again, this is operationally meaningful independently of 
the details of the ontology used to interpret the meaning of probability in a 
quantum context, using the same "Chernoff bounds" argument as for BPP. 



Interactive protocols 

There are tasks more general for computation than deciding whether x £ C for a 
decision language, or computing a function : simply taking a sample from a par- 
ticular probability distribution constitutes a computation of sorts. Such tasks can 
sometimes be given operational roles by embedding them within multi-party pro- 
tocols. The complexity of such a protocol may be measured not only in terms of 
the computational resources required by each party, but also by the communica- 
tion resources required for signalling between the parties, and intermediate storage 
requirements. The main example used in Chapter 5 is provided by an interactive 
two-party protocol, rather than by a single-party algorithm. 

1.2.3 Turing machines 

For classical computing, Turing provided a rigorous foundation by making precise 
definitions for the kinds of machines that might be considered. His machines are 
sufficiently general as to be able to simulate many other proposed paradigms. We 
next sketch some of the ideas often used when thinking about Turing machines, al- 
though not all of these ideas appear in Turing's original considerations. Equivalence 
between different models depends on the notion of algorithmic reduction (expressing 
one task in terms of another), which we will also come to shortly. For our purposes, 
a Turing machine will be an essentially classical device, having a finite (constant) 
number of internal states, and access to a finite (constant) number of 'tapes'. Each 
tape is to be thought of as a one-dimensional array, usually of bits, with certain 
restrictions governing the dynamics relating the tapes and the internal state of the 
machine. On each tape there is to be a pointer, and 'access' to the tape is via the 
pointer. 

The usual idea for Turing machines is that they process 'eager data', that is, input 
data which are all present at the time the machine is activated. There are extensions 
in Domain Theory for more general concepts of data processing, but these will not 
be relevant to the present thesis. Furthermore, we shall be largely glossing over 
the important and thorny issue of error-correction, studying instead the idealised 
'perfect' instantiations of computing machines. 
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We will take an input tape to be a read-only tape of bits. There are some con- 
texts where it is more preferable to allow algorithmic input to consist of quantum 
data, especially where multi-party computations are being considered and quantum 
communication is allowed for. But it will suffice for every topic of this dissertation 
to restrict algorithmic input and other communication always to be classical. The 
length of the input tape (i.e. size of the input) is usually denoted n. 

There should be some convention for the tape so that its input bits are all contiguous 
and so that some sensible mechanism is allowed for to determine where the end of 
the input tape is located. The input tape pointer starts at the beginning of the 
input tape. In fact, general considerations of this kind apply to all tapes of a Turing 
machine. 

Sometimes we allow for a random tape, for a probabilistic Turing machine, which 
is a read-once-read-only tape of arbitrarily long length, whose contents are set ran- 
domly when the machine commences computation. This models a random number 
generator. 

There is to be a work tape, which is blank to begin with, but may be written to 
and read from multiple times. The length of the work tape is usually taken to be 
some polynomial in n, but for some 'smaller' computational classes (such as L) it is 
interesting to consider work tapes whose length is limited to being logarithmic in n. 
There are several different ways of extending this notion into the quantum realm, 
and usually it will be more convenient to select a specific description for the task 
in hand. We recommend [72] as the definitive reference for space-bounded quantum 
computation. 

There is to be an output tape onto which the results of computation can be written. 
This tape of bits should be write-only, and is generally taken to be of arbitrary 
length. Output is 'achieved' when a machine halts, and we shall be studying the 
complexity of halting machines only. For some computational tasks, such as deciding 
operationally-defined decision languages, only a single bit of output is required. For 
example, we could adopt a convention that if "1" is output within the time-bound 
then the machine is deemed to have "accepted" its input, the input being otherwise 
deemed "rejected". For reductions in general, it is necessary to consider larger 
outputs, so that a Turing machine can act as a pre-processor or post-processor for 
another machine. 

Sometimes we allow for an oracle tape. This enables the machine to have 'black- 
box' access to some subroutine whose complexity we deliberately wish to place out 
of scope of analysis. If the machine is attached to oracle O and has data z written 
on its oracle tape at a time when it calls its 'oracle' function, then the contents of 
the oracle tape are to be replaced (albeit mysteriously) by the data O(z), in unit 
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time. Such tapes are not, however, to be used as proxies for work tapes, and care 
has to be taken when making rigorous definitions, to avoid hiding complexity in the 
oracular interface. The most famous use for oracles is to separate complexity classes 
of decision languages which are otherwise inseparable by known analyses. The most 
famous use for oracles in context of quantum computational complexity is probably 
in establishing quadratic query separation (between lower bounds for classical access 
to the oracle and quantum access) as per Grover's algorithm (see [31] and [13]). In 
the quantum case, the oracle tape should constitute qubits which can be interacted 
with the work tape, and the oracle action should be defined carefully as a unitary 
action that degenerates to the classical oracle on input computational basis states. 
As with other quantum generalisations, it is best to be specific whenever implemen- 
tation details can make a significant difference to computational power. 

1.2.4 Algorithmic reduction 

The notion of algorithmic reduction of problems has to do with using one computing 
machine as a pre-processor, or oracle, for another. Reduction is important for re- 
lating different complexity classes : indeed, the most oft studied complexity classes 
tend to be the ones with suitable closure properties under reduction. For example, 
it is easy to see (by composition of polynomials) that a Turing machine fitted with 
an oracle that decides a given language in P will not, in polynomial time, be able to 
compute anything that could not be computed, in polynomial time, by some (other) 
ordinary Turing machine not so equipped. We write, for example, P £ to denote the 
analogue of P defined relative to the attachment of an oracle for deciding C. If £ is 
itself in P then we have just seen that P c = P. More generally, when A c = A for 
all C 6 B then we say that B is low for A. 

Completeness 

There are some languages C G P which have the property that L £ = P. That 
is to say, there exists £, a 'sufficiently complex' language in P, that appending 
to a suitably designed ordinary Turing machine the 'black-box' ability to decide 
C immediately, enables that Turing machine to decide, in logarithmic space, the 
things an ordinary Turing machine would require polynomial time (and presumably 
polynomial space) for. Such a language is said to be P-complete with respect to 
logarithmic-space reduction. Strictly speaking, the determination of completeness 
requires not only a specification of the complexity of the preprocessing (in this case 
log-space processing) but also a specification of how many times, and with what 
adaptive control, the oracle calls are permitted to be made. For the sake of brevity, 
we usually have in mind that the preprocessing will be log-space and/or poly-time, 
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and a polynomial number of queries to the oracle are permitted, adaptively. 

Suppose C G NP denotes an NP-complete language with respect to poly-time 
reductions, so that P^ 5 NP. Because examples of this form exist, it is appropriate 
and common practice to denote this fact with the notation P NP d NP. Note 
that the NP appearing in the 'index' here is effectively a placeholder for any NP- 
complete decision language. (Note also that oracle use is sometimes employed not 
with a decision language but with an entire function, possibly probabilistic. In the 
same manner, function classes can be used in the index as placeholders for particular 
complete functions from those classes.) 

Simulation 

By using reductions together with an encoding of machine descriptions into bit- 
strings, Turing was able to introduce the concept of a universal Turing machine, 
a concept of simulation now entirely fundamental — indeed intuitive — to computer 
science. The idea is that we can say that W is a universal Turing machine with 
respect to some encoding C if 

r = c(t), 

U(t,x) = T{x) 

whenever t is a text describing a Turing machine T, and a; is a putative input string 
for T. The 'complexity' of the encoding C itself is not so important, because it 
doesn't depend on any input string x, and so doesn't affect the asymptotics of any 
language or function being computed. Thus U is said to be capable of simulating 
T, because it can have effectively the same output behaviour as T, for each input 
x. 

In the case of efficient simulation, the equality sign in the expression above is sup- 
posed to denote the fact that not only are the extrinsic machine outputs to match — U. 
'accepts' (t, x) iff T 'accepts' x — but additionally that the consumption of resources 
is to match {i.e. the space/time/query/randomness requirements of U are on the 
same order, as a function of the size of x, as the corresponding requirements of 
T). Thus, in specifying a computational paradigm, we are usually concerned with 
establishing some kind of universal machine that is capable of efficiently simulating 
a class of machines, with careful accounting being made of the different resources 
that are implicitly required. 
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Universality 

A machine is said to be universal for BQP if it can be used to decide any BQP 
language, with bounded probability on correctness of decisions, within polynomial 
time, and in this sense efficiently simulate a quantum version of a Turing machine. 
The (theoretic) existence of quantum Turing machines is proven in [14]. This is 
not, of course, the most powerful form of efficient simulation that we could ask of 
a quantum computer. For example, being universal for BQP does not in any way 
guarantee that one can 'manufacture the same states' in polynomial time that are 
'manufactured' in polynomial time by some other quantum computing architecture. 
(In fact, it is a philosophically thorny issue to determine what is meant by the con- 
cept of 'same state' across two different architectures, when no natural isomorphism 
between state spaces need exist, and when one's ontology need not even admit the 
existence of quantum states as objectively real.) Instead, we can make various def- 
initions of universality for quantum computation by asking for a device that can 
efficiently simulate any other quantum device within the context of a multiparty 
interactive protocol, where the interfaces on such protocols are adequately specified 
as part of the concept of universality. For example, an interface might limit the 
exchange of data to being purely classical, or it might allow for quantum data in 
essentially any form, or it might require that the quantum data be encoded onto 
two spatially colocated bosonic modes of an optical fibre, &c. As before, it is nec- 
essary that the overhead of simulation in terms of resource consumption not be too 
large, and so the study of simulations involves the continual audit of many aspects : 
physical resources (time, space, &c); non-operational resources (nondeterminism, 
oracles, &c); encodings (how IA implements T via C; what control signals pass from 
software to hardware); and interfaces (what form the inputs and outputs are to take 
between rounds of a protocol) . 

1.3 Circuits 

Quantum circuits are a particularly good way of putting the theory of quantum 
computation on a mathematically rigorous footing, sometimes preferable to quantum 
Turing machines, for example. The computational complexity classes are, by and 
large, unaffected by which paradigm one adopts, yet it is often considered more 
natural to work with circuits as the basic constructs. 

1.3.1 Classically described quantum circuits 

It is convenient to focus our discussion on quantum circuitry on qubits (two-level 
systems), though in practice, circuits can be defined on larger systems or registers. 
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The standard idea [47] is to regard a circuit of gates acting on qubits as a device 
for mapping (pure, 2 n -dimensional) quantum states onto (pure, 2 n -dimensional) 
quantum states. Thus circuits can be composed, and likewise deconstructed into 
their individual gate constituents. Their deconstruction should involve gates drawn 
from a finite (or simply-characterised) alphabet of possibile gates, and there are to 
be specific rules for using them to construct complexity classes : the most important 
rule being that a quantum circuit must be 'handled' via its fully explicit classical 
description in terms of its deconstruction into gates. 

Quantum languages from classical Turing machines 

Ultimately, the decision languages we study still arise from particular Turing ma- 
chines, even when the circuit model is used. For example, the usual approach to 
defining BQP would be to take a classical Turing machine, T, which, on receiving 
the unary input l n , outputs the classical explicit description of a quantum circuit, 
C n = C(T(l n )). The machine must be bounded in the resources it uses, so that the 
(family of) circuits thus produced can be described as uniform. Throughout this 
dissertation, we adopt the convention that uniformity implies a logarithmic-space 
bound for the pre-processing machine's operation. 

Then we can decide whether a given (classical) bitstring x is in C = C(T) by in- 
putting the quantum state \x) |0) (in the computational basis) into circuit C s i ze , x \ +a 
— where a is a prescribed polynomial function of size(x) describing the circuit's an- 
cilla requirement — and measuring the first bit of the output in the computational 
basis. Provided there is the usual semantic guarantee, as with the definition of BPP, 
that the measurement outcome is biased one way or the other with a significant (i.e. 
non- negligible) bias, then the direction of this bias is (in theory) tomographically 
accessible within polynomial time and space; therefore it can be said to indicate 
operationally whether or not i££. When this guarantee is present, we say that the 
Turing machine T issues the language C £ BQP (see §1.2.2), via a uniform family 
of circuits. 

There are a few caveats to make clear. First of all, the output of the classical 
Turing machine T should be an explicit description of the quantum circuit to be 
implemented, so that no complexity is hidden in this interface, and so that since 
the Turing machine was limited by logarithmic space, and hence polynomial time, 
we can be sure that the rendering of the circuit on state \x) ought theoretically 
be possible within polynomial time and space. Secondly, the input and output of 
quantum information are here described explicitly in a computational basis, again to 
prevent those interfaces from encoding complexity which would make the definition 
sensitive to changes in the details. That said, it is worth observing that since this 
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definition is so close to the definition used for denning uniform classical circuits for 
BPP (see e.g. [49]), we can immediately use the classical theory to see that the 
definition is entirely stable under many natural changes to the definitions. Because 
quantum circuits can easily simulate classical ones, provided only that the gate set 
for the quantum circuit is capable of simulating a finite gate set that is classically 
universal, we can encode much of the complexity of the classical Turing machine 
directly into the quantum circuitry. Therefore the definition of BQP remains stable 
even if we allow the classical Turing machine more space, but still with a polynomial 
time bound. Likewise, if we add to the circuit a measurement of all the qubits, 
this output to be processed classically by another polynomial-time-bound Turing 
machine, the class definition remains the same. All of this 'interface stability' is well 
known and documented in the literature (see e.g. [72]), but is especially key to the 
perspective taken in Chapter 4. 

Reversibility 

One difference between the way in which classical circuits are usually constructed 
and the 'standard' way (presented above) for handling quantum circuits lies in the 
detail of how space (i.e. memory) is managed. Classical circuits are usually pre- 
sented with ancillas being brought in as necessary and then ditched after use, whereas 
quantum circuits are usually presented with all ancillae 'declared' up front. Because 
of this, quantum gates are usually taken to be automorphisms (unitary transforms) 
on unitary spaces (finite-dimensional Hilbert spaces), rather than more general quan- 
tum operators. Perhaps the reason for this trend has to do with a desire to avoid 
having to process mixed (non-pure, entropic) quantum states within a circuit, at 
least for the basic complexity definitions. A change to allow the more 'dynamic' 
use of ancillas would not, of course, affect any of the complexity classes that we 
care to define, provided the rules of quantum mechanics are respected, in ensuring 
that only completely positive trace-preserving maps be employed as gates. But for 
our present purposes, such a change introduces unnecessary complexity, and will be 
avoided. 

1.3.2 Circuit interfaces 

Oracles in quantum circuits 

Quantum circuits are naturally associated with the unitary transforms that they 
induce, which are to be interfaced in a standard way when defining complexity classes 
of decision languages, and likewise of (Boolean) functions more generally. In defining 
these classes, no use is being made of quantum data outside of the quantum circuits. 
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One way in which quantum data can be conceptually 'interfaced out' of a circuit 
(besides classically as measurement results) is with a quantum oracle, the analogue 
of the kind of black-box subroutine used in classical complexity analysis. The most 
common way in which these not-necessarily-operational 'devices' are used again 
relies on the idea of quantum processing naturally extending classical processing : a 
quantum oracle in the circuit model is generally taken to be (for example) a gate that 
acts on m + n qubits and maps computational states |x) |y) to |x) |y + /(x)), where 
/ : F™ — > Fg is a Boolean function, and so the gate is unitary. The use of oracles 
of this kind enables comparison of quantum and classical complexity classes by 
relativisation, and has a natural operational interpretation in context of algorithms 
such as Grover's celebrated quadratic speed-up of computational exhaustion [47, 31, 
13]. 

1.3.3 Universal gate-sets 

It is well known that the two-qubit C-Not gate, together with the single-qubit Pauli 
gates, the single qubit Hadamard gate, and the single-qubit ir/8 rotation are uni- 
versal for quantum computing (c/. [23]). By this we mean that for any given c-qubit 
unitary gate G, for any real e, one can construct a composite approximation to G 
(up to global phase) using log(l/e)°^ 1 ^ gates and ancillae, that is within e of G under 
some standard metric such as trace-distance. As an immediate corollary, in the limit 
of e — > 0, the entire group of (special) unitaries (for any given fixed circuit size above 
a fixed lower limit) is constructible. Indeed, there is no need to include the stated 
poly-logarithmic convergence rate within the definition, since the Solovay-Kitaev 
theorem [40, 22] guarantees that if the group spanned by the gates is dense in the 
special unitary group then for constant gate size the simulation is efficient in this 
sense. This notion of approximate simulation is slightly more general than asking for 
exact reproduction of arbitrary G with a constant circuit size, which is theoretically 
interesting but perhaps not so operationally (physically) meaningful. 

An important further generalisation of the notion of universal gate set emphasises 
the key roles of simulation and reduction, rather than rendition of arbitrary ele- 
ments from the whole of the special unitary group. The observation [62, 5] that 
any probability distribution efficiently producible using the universal gateset quoted 
above is also efficiently producible using just the 3-qubit Toffoli gate (A 2 (A)) to- 
gether with the single-qubit Hadamard gate (H) serves to show that construction 
of a group dense in the whole of the special unitary group is unnecessary for com- 
putational purposes. It is readily shown that the span ( A 2 (A), H ) is dense in the 
orthogonal group, when three or more qubits are used together with a single ancilla 
qubit. Thus we may consider restricting the design of quantum circuitry to use only 
these gates A 2 (A) and H, without forfeiting universality. This has the advantage of 
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making the state space of a machine of a constant number of qubits more amenable 
to combinatoric analysis. See [5] for a fuller discussion of these issues. 

1.4 State of the Art 

1.4.1 Open conjectures 

At the time of writing, there are no known proofs for any of the following conjec- 
tures. Nonetheless, it is convenient to adopt language that assumes these conjectures 
tentatively, using phrases such as "sub- universal" as a kind of shorthand for "almost 
certainly not universal, unless a major conjecture be falsified." 

P /BPP; 

P /NP; 

BPP / BQP; 

NP £ BQP; 

BQP/PP; 

(See, e.g. Thm 6.4 in [4] for the construction BQP C PP.) All comments about 
universality for these kinds of classes should be held in tension with the fact that we 
cannot definitively prove that all these complexity classes of decision languages are 
not in fact equal. 2 It is even possible (as far as is known) that some of them may 
be independent from the standard sets of axioms used in the formal foundations of 
mathematics! 

1.4.2 This dissertation 

Our approach is to look for constructions that force 'artificial' limits and restrictions 
on the resources allowed within a computational paradigm, in order to see what kinds 
of structures are necessary to enable probabilistic algorithms to operate within such 
constraints. 

Chapter 2 deals with some paradigms universal for BQP whose interfaces are suf- 
ficiently constrained that their universality is somewhat surprising. There we use 
quantum cellular automata with particular symmetries in their dynamics. We show 
that even one-dimensional structures with very little control can be understood as 



2 These conjectures are not regarded on an 'equal footing' by many researchers. For example, 
many more people seem to believe the truth of the third than the truth of the second in the list 
above. 
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viable architectures for quantum computers (in the absence of noise), giving two 
particularly interesting examples. 

Chapter 3 considers mixed quantum states and probability distributions. We explore 
the "one pure qubit" model of computation in particular, and establish new rela- 
tivisation results as well as a construction for solving ©L-complete problems there. 
We begin that Chapter with an abstract discussion of probability distributions and 
the idea of post-selection, so as to develop some of the conceptual tools that will be 
of use in the rest of the dissertation. 

Chapter 4 bridges a gap between universality for BQP and certain sub-universal 
structures, studying the Fourier hierarchy of quantum complexity. There we employ 
the circuit model of computing, which is far more commonly used than are direct 
quantum analogues of Turing machines. We provide a description of how to use a 
so-called 'Fourier Sampling Oracle' with a classical pre-processor and post-processor 
to implement solvers for some well-known number-theoretic problems (our solvers 
have novel features, even though efficient quantum solutions to these problems are 
by no means original to this work). 

Chapter 5 bridges the gap between ©L and BQP in a different way, emphasising the 
role of inherent temporal structure in a quantum process, using the Clifford-Diagonal 
hierarchy. This gives rise to a novel quantum procedure — having apparently no 
classically efficient analogue — for performing the role of 'Prover' in a certain two- 
party interactive proof game. It is our hope that this algorithm can be appreciated 
as being 'the simplest genuinely quantum algorithm'. To that end, we provide some 
analysis of attempts to approximate it classically. 
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Chapter 2 

Universal Computing with 
Limited Control 



In this chapter, we describe some frameworks for universal quantum computation, 
i.e. paradigms that allow for simulation of BQP in polynomial time, but where 
the 'quantum memory' is laid out in a one-dimensional array of small quantum 
systems (qudits), and control over those qudits is limited in some important regard. 
These constraints take the form of certain spatial and temporal symmetries in the 
dynamics. In §2.1 we consider quantum cellular automata, where the constraints 
require that the 'same processing' happens at every site, at every time-step. In 
§2.2 we consider an architecture based on spin chains, where the constraints prevent 
almost all of the computer from interacting with the outside world. In each case, a 
novel construction is provided. 

The main purpose in each case is to show how highly symmetric systems that lack 
the possibility of local addressing can nonetheless perform powerful computation, if 
implemented without errors. This principle has long been established for classical 
systems, and the more recent theory of quantum computing systems has shown this 
also to carry over to the quantum case; so the main technical contribution of this 
work is to show that it remains valid even for one- dimensional quantum designs, 
and with the particular constraints that we consider. The presumption motivating 
this purpose is that by enforcing various symmetries, both spatial and temporal, we 
potentially broaden the range of physical architectures on which one might consider 
implementing a computing paradigm, and by limiting the design dimension to one, 
we aspire to maximise flexibility for potential implementations. 
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2.1 Quantum Cellular Automata 

In this section 1 we discuss the role of classical control in the context of reversible 
quantum cellular automata, giving a one-dimensional universal construction with 
single cell dimension 12. 

2.1.1 Overview 

Cellular automata are, broadly speaking, a way of doing computation whereby data 
are distributed across a computer that has much translational symmetry in its dy- 
namics, so that every 'site' of the computer is doing effectively the same kind of 
processing. Perhaps the most famous example of a classical cellular automaton is 
Conway's Game of Life, whereby each site (cell) holds a single bit, and each such bit 
is modified based on the settings of the neighbouring bits. To generalise this kind 
of idea to a quantum setting, one asks that the update rule for changing the 'state' 
of a cell should have a unitary behaviour. Now when we consider an infinite lattice 
of cells, it is hard to conceive of a single update rule as being an actual mapping 
from states to states, so it is more convenient to think of the rule in the so-called 
Heisenberg picture, whereby its action is understood on the algebra of quantum ob- 
servables rather than on Hilbert space vectors. The structure theorem given in [57] 
shows that it is always possible to regard a single transition rule as being comprised 
of two time-slices of applications of a finite unitary map repeated in parallel, as 
shown in Fig. 2.1. 

We consider within this section two different computational models, both of which 
are quantum cellular automata (QCAs), i.e., distributed systems of lattice cells 
with a spatially homogeneous discrete time dynamical evolution of strictly finite 
propagation speed. These differ from the abstract notions of one-dimensional QCAs 
given in [71], which correspond with dynamics which may be unphysical or code 
for arbitrary complexity at the physical layer, not having any constructive local 
Hamiltonian representation. The models we use have an explicit decomposition, 
being physical according to the definitions given in [57]. (See also [19] for background 
and the general theory of quantum cellular automata, and [74] for a recent survey 
of QCAs.) 

Our two models differ from each other in the way the program operates, or more 
precisely how the quantum part of the computer interacts with a classical controller, 
being somewhat analogous to the gate model and the Turing machine model respec- 
tively. In the gate model, the classical controller has to be comparatively powerful : 

lr This section is largely taken from my 2006 publication with Torsten Franz and Reinhard Werner 
[61]. 
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on receiving the input, it will compile the program in a version adapted to the size 
of the input, and actually build a quantum circuit to run it. The flexibility of this 
model hence largely resides in the classical controller, and the quantum computer 
hardware is, so to speak, scrapped after each instance. In contrast, a classical uni- 
versal Turing machine takes its flexibility from the possibility of writing both the 
program and the input data on its tape for initialization. We can apply these ideas 
to running a quantum cellular automaton as a computer : on the one hand, we can 
use a classical compiler to select a classically described sequence of operations each 
of which is a QCA time step in its own right. Such a machine will be called a classi- 
cally controlled QCA (ccQCA). On the other hand, we can insist that program and 
data are written into the system by the initial preparation, after which the machine 
runs autonomously for a certain number of steps, and with a fixed transition rule 
independent of the problem. The only role left for the classical controller is then 
final measurement to read out the result. It is entirely possible that the absence 
of classical signalling from this second model (except at initialisation and readout), 
coupled with temporal translational symmetry, may prove to have the pragmatic 
value that an implementation can be more readily isolated from decoherence chan- 
nels while it is 'running' its program, thereby enabling lengthy computation without 
explicit error-correction. 

We show constructively that these two ways of programming a QCA (see §2.1.2) 
are computationally equivalent. In the proof we use a structure theorem for cellular 
automata obtained in [57]. This theorem holds in any lattice dimension, and so 
do the ideas of our construction, but we stick to the one-dimensional case as it is 
sufficient for bounded error quantum probabilistic computation. We then use this 
equivalence to build a universal autonomous QCA, with an explicitly given transition 
rule, where "universal" means that it simulates the gate model up to polynomial 
overhead. 

The universality of a one-dimensional QCA may be seen as surprising, since recent 
research ([75]) has shown that a one-dimensional cluster-state computer is always 
simulable classically in polynomial time, and two dimensions are therefore neces- 
sary for that computing model to transcend BPP. More importantly, the practical 
importance of using just one dimension in the QCA lattice has been suggested by 
certain authors ([12]) not for philosophical physical reasons but for practical engi- 
neering concerns, it being much easier in many cases to design equipment to interface 
with a low-dimensional structure. Besides showing universality in one lattice dimen- 
sion, our construction also employs a significantly smaller cell size than that of other 
similar machines discussed in the literature, [52, 69]. 

Here is a summary of the properties of the main construction that we present : 

• Universal; a (physically reasonable) paradigm capable of simulating quantum 
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circuits, with polynomial overhead in most reasonable measures. 

• Discrete space, infinite or unbounded; there is an infinite line (lattice) 
of cells (qudits), q, each cell associated with the algebra of a d-dimensional 
Hilbert space, where d is some constant. 

• Discrete time; unital homomorphisms representing discrete steps, as opposed 
to a Hamiltonian description. (Care must be taken with unitaries, now that 
the underlying Hilbert space is potentially infinite-dimensional.) 

• Non-adaptive; all 'software' is encoded at the input stage, thereafter dynam- 
ics are completely fixed for the universal device. 

• Reversible; update rule for observables is given by a unital homomorphism, 
T, on the (quasi-local) operator algebra, i.e. an algebra homomorphism that 
transforms rank-1 Hermitian projectors to rank-1 Hermitian projectors, con- 
formally preserving orthogonality. (Physically, this can be understood as gen- 
erating no entropy.) 

• One-dimensional; the rank of the lattice is 1, so that each cell has just two 
neighbouring cells, and the cell indices are integers. 

• Spatially symmetric; T commutes with all lattice translations (q i— > Q+i). 

• Temporally symmetric; apart from initialisation and readout, the only dy- 
namic is T, repeated over and over. 

• Nearest-neighbour locality; if H is an operator supported on cells belong- 
ing to S, then T(H) is supported on cells of S and their nearest neighbours 
in the lattice. 

Recent work [45] has shown that there is a one-dimensional QCA design in the 
continuous time model that requires only ten levels per cell, rather than the 12 that 
we use. It is still an open problem to establish tight bounds in either case. Other 
aspects of the complexity of one-dimensional continuous systems, including the local 
Hamiltonian problem, are discussed further in [6]. 

2.1.2 General construction techniques 

The description of a QCA is most readily given using the Heisenberg picture, which 
is to say that we describe evolutions by how they transform the C*-algebra of local 
quantum observables for the system [57]. This transformation must always have 
some spatial symmetry if it is to be called a QCA. 
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From circuit model to ccQCA 

Definition 2.1.1. A classically controlled QCA (ccQCA) is modelled as a list of 
unital homomorphisms of the observable algebra associated to an infinite line (lattice) 
of qudits. The symmetry requirement is that there be some full-rank group of lattice 
translations, each element of which commutes with each homomorphism on the list. 
Quantum data stored in the lattice is processed by the sequential application of these 
homomorphisms. 

For universality with regard to L-reductions, we look for there to be a log-space 
Turing machine that converts the description of an arbitrary quantum circuit (given 
in some standard explicit form) to a description of such a list of homomorphisms, 
so that the effect of the quantum circuit is emulated within the cells of the ccQCA 
as it works through applying the homomorphisms on the list. 

Consider the circuit model of quantum computation wherein qubits are present in 
a one-dimensional lattice (also called a 'band'), and any gate may act unitarily on 
just two neighbouring qubits. Such models are seen to be BQP-universal, when a 
sufficiently complex gate-set is admitted, e.g. as exemplified in [5]. Then there are 
various direct ways of implementing such circuits as classically controlled QCAs. For 
example, one could envisage increasing the cell size by a constant factor so that it can 
effectively represent two parallel bands, one (called the 'data band') for encoding the 
qubits of a circuit, and one (called the 'pointer band') for encoding a pointer, much 
like the 'read/write head' of a Turing machine. The transformations of the ccQCA 
could manipulate the location of the pointer and then use that pointer to break 
the spatial symmetry of the dynamics so that individual specific neighbouring data 
qubit pairs (encoded in the 'data band') may be addressed, as required. The data 
band and pointer band can of course be regarded as one single band, by interleaving 
their qubits; at the expense perhaps of having fewer of the translations of the lattice 
commute with the homomorphisms of the ccQCA. 

From ccQCA to QCA 

Definition 2.1.2 (Cf [57] Def 1). A QCA is modelled by a unital homomorphism 
T of the algebra of observables on a lattice of qudit cells (Heisenberg picture). T 
must commute with all lattice translations. 

For a QCA to emulate a ccQCA, we look for there to be a log-space Turing machine 
that converts the list of homomorphisms associated to the ccQCA into a list of bits 
that can be interpreted as a 'program' to be loaded into the cells of the QCA, along- 
side the data, at time t = 0, so that after some polynomial number of applications 
of T the 'program' will have interacted with the 'data' so as to emulate the desired 
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Figure 2.1: Two time-steps of the cc- 
QCA. The clocks indicate the the time 
before and after application of the uni- 
taries for comparison to Fig. 2.2. 



transformation. We allow for the possibility that the physical location of the 'data' 
in the cells after the applications of T may be different from its starting location, 
but naturally there ought to be no complexity hidden in this translation. 

The main conceptual tool for understanding the decomposition is the QCA structure 
theorem (Theorem 6 in [57]). This theorem guarantees the existence of a Margolus 
decomposition : two finite unitary operations (C/j and Vi) for each of the ccQCA 
transition rules which implement the time evolution by sequential application to 
non-overlapping neighbourhoods (as indicated in Fig. 2.1). This saves having to 
reason purely in terms of unital homomorphisms. (Note that a single finite uni- 
tary map will not generally suffice for a QCA homomorphism in any discrete model 
because it will have fixed eigenvalues — with algebraic multiplicities matching geo- 
metric multiplicities — and therefore be close to a unitary map having finite order, 
independent of the size of the computation.) The structure theorem applies to 
nearest-neighbour ccQCAs; so given an arbitrary ccQCA, one first needs to convert 
it into a ccQCA with nearest-neighbour interaction, which is always possible in a 
trivial fashion by merging cells and enlarging the dimension of the qudits that form 
the lattice. 

Consider an autonomous QCA that consists of a data band representing the one- 
dimensional lattice of the ccQCA being simulated and a program band containing 
information about the sequence of transition functions that the ccQCA would apply. 
Let the ccQCA have access to k different homomorphisms. Then the cell size of the 
program band is chosen to be 2fc + 1, enough to distinguish the 2k unitaries, allowing 
for an extra symbol representing the identity map. Let the time evolution of the 
QCA be the product of a 'shift step' shifting the program band two cells past the 
data band, followed by a 'calculation step' performing the required unitary maps 
on pairs of cells of the data band, each controlled by the neighbouring contents of 
the program band. As the program band moves past the data band, each data cell 
(qudit) undergoes the time evolution of the ccQCA being simulated, yet it should be 
noted that different time-steps in the ccQCA evolution are present at one time-step 
of the autonomous QCA (see Fig. 2.2). In accordance with the definition of QCA, 
it is important that there arises no possibility of non-commuting unitaries operating 
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on the same cell at any time; note that the unitaries Ui and Vi obtained from the 
QCA structure theorem work on different combinations of odd and even cells (see 
Fig. 2.1). Therefore, to circumvent this possibility, one can design the autonomous 
QCA such that the localisation regions of the unitaries are separated by one idle 
cell, as depicted in our example (Fig. 2.2). 

This construction gives an autonomous QCA which, since its dynamics must by 
definition be (spatially) translationally symmetric, has cells composed of three cells 
of the original ccQCA plus one cell from the program band; and it turns out to 
have only nearest-neighbour interactions. This general construction scheme can be 
optimized in an explicit situation to reduce the large cell-size. Next we give such 
an explicit construction by starting from a universal ccQCA with homomorphisms 
that already have sequential structure, so the Margolus decomposition can be omit- 
ted. 

2.1.3 Explicit construction 

In this subsection, we lay out a series of emulations, so as to make clear an explicit 
construction. 

Circuits of "controlled partial- Y" 

There is a two-qubit gate which, if not constrained to act always on neighbouring 
qubits but allowed to act on qubits within arbitrary range, serves as universal for 
computation within the standard gate model. For example, we use the gate defined 
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in the computational basis, { |00) , 1 01) , 1 10} , |11) }. It performs a 7r/4 rotation on 
one qubit conditioned on the setting of another, and may equivalently be written 
G a b = A (V '—iY b ) . To see that this gate is universal, it suffices to show that by use 
of simple (computational-basis) ancillae, one can simulate both the Hadamard and 
the Toffoli gates (c/. [47]), according to the well-known results of [5]. 

Proposition 2.1.3. Gates G = A(\/—iY) with no nearest-neighbour restriction, 
together with ancillte |0) and |1), can emulate ancillcE |+) and \— ) and all gates in 
the set{X,Y,Z,H,G-\A 2 (±iY),A 3 {±iY),A 2 (Z),A 2 (X)}. 

Proof. \+) a and |— ) a are emulated respectively by G\, a \l) b |0) a and G& |1) 6 |1) Q , 
and Y a is emulated by G ba |1) 6 , since global phase is unphysical. Z a is emulated by 
G\ b \ip) b for any tp. X is emulated (without ancillae) by Y ■ Z, and H a by G\, a • Z a \l) b - 
The inverse G~ l is equal to G 7 because its eigenvalues are all eighth roots of unity. 

To emulate A 2 b (±iY c ), we use an ancilla |0) which will temporarily hold the 'parity' 
of qubits a and b. Thus we first need the subroutine A a ^ b {±iY c ) emulated by the 
sequence G~ 2 ■ G~ b 2 ■ G^ ■ G 2 ap ■ G 2 p ■ |0) p . Then we have A 2 ab (±iY c ) = G^ ■ G^ 1 • 

A a(Bb (±iY c ). 

The emulation for A^ bc (±iYd) is rather similar, e.g. it suffices to use the sequence 

A^(fy e ).A e 2 c (±^).A2 fe (-zy e ).|o) e . 

Then A 2 ab {Z c ) is emulated by A^ 6c (^e) 2 |0) e , and A 2 ab (X c ) = H c ■ A 2 ab (Z c ) ■ H c . D 

(We offer no guarantee that these are the simplest emulations possible. See §1.3.3 
for more on universal circuit gate-sets.) 

Construct qubit ccQCA 

Consider a ccQCA on a one- dimensional qubit lattice that allows the use of four 
different kinds of QCA-homomorphisms as described below, called A, B, C, and 
D. These homomorphisms will be constructed from infinite tensor products of G 
unitaries. To prevent subscripts from becoming unreadable in what follows, we will 
also write G(x, y) for G acting on qubit y controlled on qubit x, which was formerly 
denoted G xy . 

To show how this ccQCA can be used to simulate an arbitrary gate model circuit 
whose gates are all of the kind G (line (2.1)), we will think of the ccQCA's qubits as 
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belonging to three interleaved one- dimensional lattices, and label them accordingly 
as di,cii, hi with i £ Z, as illustrated in Fig. 2.3. We will load the <i-band with input 
corresponding to the input of the circuit being simulated, initialising its unused 
qubits to |0). The o-band will be used as 'ancilla space' and should be initialised 
to |0) everywhere. The /i-band is used to break spatial symmetry of the dynamics, 
and should contain a single 'pointer' |1), with the rest of its qubits containing |0). 
The four families of homomorphisms we consider here are given explicitly as (tensor) 
products of unitaries 

^ = J 1 G(h x ,a x+ i), Bi = II G(h x ,d x+ i), 

(2.2) 

Ci = J | G(d x , a x +i), D% = J | G(a x , d x +i). 
xez x&j 

Proposition 2.1.4. For each i,j E 7L, i ^ j, there exists a sequence of homomor- 
phisms drawn from those of line (2.2) which, when applied to a tri-band lattice of 
cells initialised as described above, emulates the unitary gate G(di,dj) on the d-band 
and restores both the a-band and the h-band to their initial (separate) configurations. 
The complexity of the sequence is constant, though its description complexity grows 
logarithmically with i and j . 

Proof. First note that each of A4, . . . , Di has order 8, since that is the order of the 
unitary G. Suppose without loss of generality that ho is the present location of 
the pointer. Consider the sequence T := CCj • Bf ■ C-i. The only place it has 
net effect (because of the pointer) is between qubits di and ao, where it emulates 
T" := G~ 1 (di, ao) ■ Y^ • G(di, clq). The sequence required by the Proposition is then 
taken to be 

Al ■ Cli ■ Bf ■ C. % ■ D j ■ Ci, ■ Bf ■ d ■ Ao, (2.3) 

which we can reparse as 

A 1 ■ T- 1 -DjS- A , 

and which — given the promised initial conditions — emulates 

\fYoQ ■ T'~ ■ G(a , dj) ■ T' ■ \fY^ , 

restoring all other qubits. Since the a-band starts out entirely zero, this last line can 
be shown (by direct computation of 8-by-8 matrices) to emulate G(di, dj), restoring 
ao also. (To simplify this final computation, it helps to notice that the product 
T' ■ y/Ya is given by a 4-by-4 integer matrix, whose action can be perceived using 
'classical intuition'.) □ 
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Of course, a clever compiler would find ways of simulating a given circuit that are 
more efficient than repeated application of this technique. 

Construct nearest neighbour qubit ccQCA 

The ccQCA described above employs operations with arbitrarily large neighbour- 
hood. With additional neighbour-swap operations we can first move two bands to 
the required interband-distance i, then apply a cellwise G-operation (Aq, Bq,Cq, or 
Dq) and finally shift back, in order to implement all operations of line (2.2) with a 
nearest neighbour ccQCA. Moreover, the inter-band G-operations are quite similar, 
which suggests interleaving the three bands into a single qubit band labelled qi, with 
i€ 1, 

(■•-,9-2,9-i,9o,9i,92,---) = (. ..,o_i,/i_i,rfo,oo,/io,...)- ( 2 - 4 ) 

By Proposition 2.1.5, a sufficient set of operations is then given by 



E 3 = [I Swa P(93z+j,93x-+j+l} 
F j = II G(q3x+j, 93a;+j+l), 



(2.5) 



x-ez 



forj G {0,1,2}. 

Proposition 2.1.5. With the relabelling of line (2.4), for each i£ Z, for each ho- 
momorphism Ai,B{,Ci,Di, there exists a sequence of 'short-range' homomorphisms 
drawn from those of line (2.5) which, when applied to a single-band of qubits, emu- 
lates the required homomorphism. The complexity of the sequence is linear in i, and 
so its description complexity also grows as 0(i). 

Proof. Note that it suffices to move two bands relative to each other, since the 
homomorphisms of line (2.2) act non-trivially on only two bands at a time. Since 
all the cases are basically the same, we will illustrate emulation of A% only : 



A, 



(E2 • Eq • E\ • E2 • Eq) • F\ • (Eq • Ei ■ E\ ■ Eq • E2) 



a 
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Construct universal nearest-neighbour QCA 

The homomorphisms of the ccQCA described above already work on non-overlapping 
neighbourhoods, and so there is no need for further Margolus decomposition here. 
For our main design of an autonomous nearest-neighbour QCA, we introduce a 
'program band', and focus on minimising the dimension of the individual cells. 

Take a one-dimensional lattice of qudits labelled q (for i£2) of single cell dimension 
d = 12, and regard these as incorporating one qutrit cell of a program band t{ with 
two qubit cells q-a and cfe+l from the data band of the ccQCA. The cell Cj we define 
explicitly as the tensor product 



U®q2i®q2i+i- 



(2.6) 



(Identification of data cells is indicated in Fig. 2.4.) As before, it is not necessary 
to have any 'fine control' over the relative motion of the two sub-bands t and q; 
rather we simply allow one to pass by the other with an invariant velocity. This 
is achieved by decomposing the QCA transformation step into two parts, a unitary 
and a shift : 



S: 



U: C — > C acting on every cell simultaneously, 

(ti H> ti+i 
qi \-t qt-i 



sliding the bands relatively. 



(2.7) 



To simulate the nearest-neighbour ccQCA, we will interpret the data band qi exactly 
as before, but the program band ti must be initialised so as to execute the appropriate 
transformations on the data band as the two bands slide past one another. At 
initialisation, the cells i > will be used to hold the non-zero content of the data 
band in their qubits, while the cells i < will be used to hold the program band 
in their qutrits. We will initialise the ti in the computational basis, and the U 
operation will be defined to leave these qutrits invariant. Specifically, ti = |0) will 
cause no transformation, tj = |1) will cause a swap of data between q2i and q2t+i, 
and ti = |2) will cause the transformation G(q2i,q2i+i) described at line (2.1). 

Proposition 2.1.6. There is a nearest-neighbour QCA on a 1- dimensional lattice 
that efficiently emulates each of the six homomorphisms of line (2.5), in each case 
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using O(l) cells to store the instruction. The description complexity of the QCA 
program is therefore linear in the number of ccQCA homomorphisms it emulates, 
and the run-time of the QCA is linear in the sum of the program length and data 
length. 

Note that in our construction (below), the 'instructions' or 'program-segments' are 
given by triples of qutrits which remain in computational basis states through- 
out. 

Proof. Take U to be the 12-by-12 unitary described above, S to be the shift operator 
described above that slides program qutrits (ti) past data qubits (%), and T = S -U 
to be the nearest-neighbour unital homomorphism of the QCA. Each of the six 
homomorphisms of line (2.5) is emulated on all the data qubits (qt) by a specific 
pattern of three neighbouring qutrits of program completely sliding past all of the 
data qubits. That this T describes a nearest-neigbour homomorphism is immediate 
from Fig. 2.4. 

The program-segment 1 100) on t3i, ^3^+1, ^32+2 — as it moves rightwards — will simu- 
late the homomorphism Eq. This is because the |0) initially on £34+2 will hit every 
pair g3j_|_i,g3j + 2 having no effect, then the |0) initially on £3^+1 will hit every pair 
c/3j+2, <?3j+3 having no effect, then the |1) initially on tzi will hit every pair q%j, q^j+i 
thereby implementing Eq. Similarly, the program-segments |010) and |001) will sim- 
ulate the homomorphisms E% and E\ respectively. Likewise, the program-segments 
1 200), 1 020), 1 002), will simulate the homomorphisms Fq, F2, iq, respectively. This 
is in accordance with the general construction idea outlined in §2.1.2. The cells with 
negative index should be initialised with program-segments of these kinds in order to 
induce the desired transformations on the data. The cells with non-negative index 
should be loaded with the relevant data. The computation output may be read (in 
the computational basis) any time after the content of the program band has moved 
past the content of the data band. □ 

To show that the composite simulation is efficient, one needs to estimate the neces- 
sary resources. Consider a quantum circuit (QC) consisting of SpaceQc qubit-wires 
and TimeQc G-gates (assuming no exploitation of parallelism). In the first simu- 
lation step, the resources of the ccQCA depend linearly on the corresponding QC 
resources (Propos. 2.1.4). The use of swap gates in the next step increases the time 
(Propos. 2.1.5); the encoding of the program into the program band increases the 
space, so one ends up with an estimate for the autonomous QCA of 

TimeQCA = 0(Time QC ■ Space QC ), 

Space QCA = 0(Time QC ■ Space QC ). (2.8) 
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The resources depend polynomially on the given QC, so the simulation is deemed 
efficient for universal BQP simulation. 

2.2 Discrete-time Spin Chains 

In this section, 2 we highlight another design for a novel QCA-based paradigm uni- 
versal for BQP. This is based on observations of so-called 'quantum wires' or 'spin 
chains', cf. [18, 12, 53, 38, 28, 29]. 

The main technical contribution of this section is to construct a design of discrete- 
time spin-chain processor which, under classical control, is universal for BQP, but 
which has the special feature that all signal addressing (classical control) goes not to 
the whole machine but only to a tiny part of it (the 'control window'). Accordingly 
it also seems that our encoding of logical qubits within physical spins is novel and 
marginally more efficient (2/3 density) than the more common methods of 'barrier 
qubits' [12]. The context of our design is similar to that of [53], but again there 
signalling is passed to all qubits of the machine rather than only to a small part, 
even though translational invariance of dynamics is enforced. It appears that our 
design construction is, in some sense, 'simplest' amongst the discrete-time spin- 
chain models, and an analogous continuous-time universal construction is presently 
lacking. For example, [76] presents a nice continuous-time 'processor core' model, 
but it nonetheless engages control signals to the bulk of the qubits, rather than to 
a small 'window'. 

It can be argued that continuous-time and discrete-time models for dynamics on a 
lattice of quantum cells are not directly comparable, since the homomorphism for a 
discrete-time QCA is generally given by a Margolus decomposition into alternating 
unitaries (cf. §2.1), whereas for a continuous-time QCA it is given by a local Hamil- 
tonian. But a general local Hamiltonian on an arbitrarily lattice, when executed for 
any fixed length of time St, is liable to induce a unitary that is not completely local 
but rather allows a small (albeit negligible) amount of information to propagate 
arbitrarily far. Conversely, if the alternating unitaries of a Margolus decomposition 
are encoded directly within a Hamiltonian, then that Hamiltonian must oscillate in 
time and not be constant, so that it can represent each of two different unitaries in 
turn. Thus there seems to be no obvious way to transfer results from one context 
directly to the other, and so this work is perhaps not directly comparable with those 
studies of spin-chains in context of continuous time. 

Here is a quick summary of the properties of the main construction, the autonomous 
QCA, that we present in this section : 



2 Previously unpublished work, the ideas in this section were presented during a talk given at 
Bristol in 2007. 
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• Universal; a (physically reasonable) paradigm capable of simulating quantum 
circuits, with polynomial overhead in most reasonable measures. 

• Discrete space, finite, one-dimensional; a 'chain' of N physical qubits 
'attached' to a single qutrit 'window'. 

• Discrete time, nearest-neighbour locality; a 'clock' homomorphism, com- 
posed of two local unitaries interleaved, which causes finite speed of data prop- 
agation, and is highly symmetric in space and time. 

• Limited dynamics; apart from the clock, all other operations must affect 
only the qutrit window; i.e. all control signals are addressed to 0(1) of the 
storage space. 

2.2.1 Addressing control in a discrete-time spin-chain 

Our approach here differs from the one of §2.1, and from other similar considerations 
in the literature, in that now we make no assumption about being able to address 
all of the computer to read out and load in data and program, but we do allow 
local time-dependent control of a very small part of the computer. This small part 
is effectively to be considered as the only 'window' that the device has onto the 
outside world, the rest of the machine being isolated from control and environment. 
Whereas one might expect it to be necessary to possess a large degree of localised 
control during initialisation and output — the very places where decoherence has 
the 'benevolence' of enabling non-reversible 'entropic' effects to take place, such as 
resetting and measuring — relegating initialisation and output to the first and final 
time-phases respectively of the overall computation; yet, in the present design, we 
instead constrain 'entropic' effects not to certain temporal phases but to a particular 
spatial location : the terminus of a 'quantum wire'. The design remains BQP- 
universal despite requiring only a constant number of different kinds of operation 
(including addressing) , in the same spirit as the designs given recently in [58] . 

Physical terminology 

It is convenient to borrow some language from the physical architectures proposed 
for implementing 'quantum wires'. Thus we refer to these structures as spin chains, 
since the individual 'low-level qubits' constituting a 'quantum wire' are invariably 
imagined to be (or indeed convincingly implemented as) nuclear spins in context 
of an Ising model, or similar. The idea is to have a chain of sites, indexed by 
{0, 1, 2, . . . , N — 1}, where a two-dimensional Hilbert space is associated to each site, 
to describe a physical qubit there. Such low-level qubits are then termed spins, to 
emphasise two important properties : firstly the idea that these qubits, unlike the 
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logical qubits to which we shall be coming shortly, are not abstracted very far away 
from the underlying physical architecture, and are most likely implemented as the 
quantum spin of a spin- ^ particle; secondly the idea that there need be no difference 
in energy levels between |0) and |1), no explicit method of addressing these qubits 
arbitrarily, and no preferred basis for (unwanted) decoherence. The exception to 
this rule applies at the terminal site, having index 0. This 'window' site is instead 
associated with a three-dimensional space (hence a qutrit), since the use of a larger 
site for the 'window' onto the device will be seen to simplify much of the rest of the 
design of the computing paradigm. (Whether a qubit window would in fact suffice 
here is not presently known.) 

2.2.2 Clocks with graph-symmetry 

The notion of spin chain can be generalised to that of a spin network, according to an 
undirected graph. Although we shan't need graphs more complex than linear arrays, 
it is appropriate to describe the clock dynamics in the more general case. 

Let Q(V,£) denote an undirected graph each of whose nodes is associated to a 
distinct physical qubit. Using X and Z to denote canonical Pauli operators and 
subscripts to denote qubit indices, we can define the symmetric discrete-time clock 
dynamics for Q according to the following formulae (cf. [53] and §0.3) : 

tt ._ „wr( Xj+Zj-V2 )A/8 _ X j + Z j 

Vg ■= U H J 

jev 

Aj(Z k ) := j*(i-Zj-Zu+ZiZk)/* = 1 + Zj + Zk-ZjZ^ (2g) 

Eg := Yl K 3 {Z k ), 
Gg := Eg • Vg. 

This discrete-time picture is in many ways simpler than the corresponding continuous- 
time dynamic, fitting more naturally with a discrete-space model and with standard 
notions of computation. For example, there is no need to tune the individual inter- 
action strengths in order to obtain a uniform flow of data, cf. [18] . 

These formulas are reminiscent of the operations used in Graph State computing, 
where Q would be a two-dimensional lattice and Gg would map the all-zero state |0) 
into a so-called cluster state for measurement-based quantum computing (cf. [51, 
54]), which is again a discrete-time universal computing paradigm. By contrast, 
that model uses Gg only once, and only to establish initial entanglement, not to 
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distribute control signal nor indeed any other data. 

We encode logical qubits using the clock Gg directly. 3 Specifically focussing on any 
spin at the end of a spin chain (a 'leaf of Q), it is easily shown to be necessary to 
wait for precisely three clock-ticks before all of the data from that spin has been 
transported away, assuming that Q has the local topology of a simple spin chain in 
the immediate vicinity of the terminal spin in question. Reason as follows : 



Gg ■ Xj ■ Gg - 


- Z 3i 


Gg • z j • G g '- 


= x j n z k\ 

(j,k)e£ 




Gg • Xq ■ Gg 


= XIZ2, 


Gg ■ Zq ■ Gg = 


- X2Z3. 


(2.10) 



(Note that each of X1Z2 and X2Z3 commutes with each of Xq and Zq, and so can be 
taken to represent a qubit distinct from the one represented by Xq and Zq.) 

Since it is necessary for logical qubits to be properly distinct from one another, this 
naturally suggests taking our logical qubits to be revived in sequence at a given 
terminus after every three clock-ticks. Accordingly, we can define the logical qubits 
by specifying pairs of anticommuting operators to serve as their 'Pauli basis' : 

X 3 := Gg • Xq- Gg 3 , 

Zj := G^-Zq-G^ 3 . (2.11) 

Calligraphic script is being used to denote operators that define logical qubits, while 
ordinary script is being used to denote operators that define the physical spins. 

Proposition 2.2.1. Let Q be the graph that is a simple line on N = 3n + 2 vertices, 
and let Gg be a clock homomorphism on that graph as defined at line (2.9). Then 
N + 1 = 3n + 3 clock-ticks reverses the data on the physical spins (vertices) of the 
graph, and 6n + 6 clock-ticks therefore revives the initial state perfectly. 

Proof. The projective Pauli group on iV spins, obtained by quotienting away global 
phase, is Abelian, and therefore isomorphic to the {additive group of the) vector 
space Ff^. The operator Gg is in the Clifford group (that is, its conjugative ac- 
tion stabilizes the Pauli group), and so its conjugative action on the projective 
Pauli group must be a linear endomorphism. Thus it must have a representa- 
tion via a 2N-hy-2N matrix over F2. This is called the stabiliser formalism in 
[53]. We can choose to list a basis for the projective Pauli group in the order 
{Xq, Xi, . . . , Xjv-i, Zq, . . . , Zat_i}, and then a matrix for Gg is given in line (2.12) 



3 I have a program that allows one to draw an arbitrary graph, colour its nodes with Pauli 
operators, and then evolve the operators at various speeds with the clock Gg . This not only makes 
for a novel screen-saver, but also helps make more intuitive the Gottesman-Knill theorem. 
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in N-by-N block form, where Ag is an adjacency matrix for Q : 

Mg := | ° M . (2.12) 




Then generate the recurrence 5_2 := 1, <S-i := 0, Sq := 1, S& := Sk-2 + ^-e ■ S'fe— i, 
and observe inductively that 

M k g = ( Sk - 2 Sk - 1 ) . (2.13) 



Using this formulation, it is straightforward to check the properties of various spin 
networks (graphs) in context of the (discrete-time) dynamics of Gg for data flow. 
For the present Proposition, it suffices to consider the case where Q is a line on 
N = 3n + 2 vertices. In that case, it only remains to show that 57V = and that 
both Sm±i are equal to the 'reversal' matrix : the permutation matrix that reverses 
the order of the vertices. Then Mg + will have the effect of reversing the data on 
the vertices, as required. 

But Sk-, as a polynomial in Ag over F2, actually is the characteristic polynomial of 
the adjacency matrix of the line Q{k) on k vertices, because each is given by the 
formula Det(XI + Ag( k )) over ^2- 

Det(XI + A g{k) ) = DetiXI + Ag^i^ + X-DetiXI + Ag^!)). 

So for k = N = 3n + 2 we see that Sk = automatically for the linear graph. To see 
then that SWii must be reversal matrices, note that the group of symmetries of the 
line is of cardinality 2, so Sn±i can only be either a reversal or the identity. That 
it is in fact a reversal can be seen directly from Fig. 2.5, where the case N = 8 is 
fully illustrated. A full analysis of this algebra is given in the appendix of [53]. □ 

Proposition 2.2.2. A linear spin chain having N = 3n+2 nodes will encode exactly 
w = 2n + 2 logical qubits by the rule at line (2.11). 

Proof. One logical qubit is encoded every three clock-ticks, under the encoding rule 
suggested. It takes 6n + 6 clock-ticks to revive the original state (previous Propo- 
sition). Thus in one cycle of 6n + 6 clock ticks there is scope for 2n + 2 logical 
qubits to be encoded. That these logical qubits are independent can be seen by 
more matrix algebra or directly intuited from Fig. 2.5, where the case n = 2 is fully 
illustrated. □ 

This entails a natural encoding density of ^ ~ |, (logical to physical ratio) cf. [12]. 
Unlike the technique of block-coding discussed in [29] , our method keeps the logical 
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Figure 2.5: The encoding relating 3n + 2 
spins to 2n+2 logical qubits on a linear graph, 
in the case n = 2. (Time vertical, space hor- 
izontal.) Each column represents a physical 
spin, 8 in this picture. Each row represents a 
single Pauli operator on the 3n + 2 spins, and 
the row below it represents the same operator 
conjugated by a clock-tick, Gg. The shaded 
rings pick out pairs of anticommuting Pauli 
operators that serve to define the 2n+2 logical 
qubits, 6 in this picture. Operators from dif- 
ferent pairs commute, and so the logical qubits 
are distinct. 



qubits from dissociating over a wide area, so that a reasonably standard local error 
model could be utilised. That is to say, the spontaneous depolarisation of a spin 
will damage at most two logical qubits at any given time. Another advantage of 
retaining a good degree of locality in the encoding is that it makes tomography 
more straightforward in the case where the implementation is such that one does not 
know a priori how many spins are in the chain. (Having said this, our main concern 
is with structural simplicity, and not the adaption of design for error correction 
capability.) 

As can be seen from Fig. 2.5, at most two of the logical qubits will be revived on 
local spins after any given clock-tick, these spins being the ones at either end of the 
chain. (Indeed, it is not hard to show that for any undirected graph Q, if a logical 
qubit is identified with one of the vertices of Q, then there can be at most one other 
vertex where that logical qubit is capable of fully reviving under the repeated action 
of the clock Gg alone.) 

2.2.3 Window qutrit 

Recall that we intend for the site at one terminus of our spin chain to house a qutrit 
rather than a qubit. (Here it is most definitely appropriate to speak of "three energy 
levels" , because we expressly intend to address control signals to this physical qutrit 
directly.) We identify the two lowest energy levels of the qutrit with a logical qubit, 
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so that the operators Xq and Zq remain well-defined 4 for the sake of the clock Gg 
(line (2.9)) and the encoding of the other logical qubits (line (2.11)). The third 
energy level is reserved as a 'storage space' to enable fine control over the evolution 
of the system as a whole. 

Now, the computing paradigm is defined by assuming that Gg evolutions happen 
regularly, and between any two Gg evolutions we are free to apply any physically 
plausible operation we please to the qutrit at site 0. 

Definition 2.2.3. A program for the spin-chain computer is defined to be a list of 
valid qutrit operations (e.g. 3- dimensional unitaries, measurements in the compu- 
tational basis, &c), to be applied on the 'window qutrit', to be interleaved with clock 
homomorphisms Gg. Quantum data stored in the spin-chain is processed by working 
through the list. 

The qutrit operations we shall use to obtain universality are listed below : 

• Reset; replace the qutrit with the pure state |0). 

• Measure; obtain a classical trit, collapsing the state of the system according 
to the usual Born rule by projecting onto an energy level. 

• Unitary; apply a 3-dimensional unitary gate to the qutrit. 

For universality with regard to L-reductions, we look for there to be a log-space 
Turing machine that converts the description of an arbitrary quantum circuit (given 
in some standard explicit form) to a description of such a list of homomorphisms, so 
that the effect of the quantum circuit is emulated within the spin chain as it works 
through applying the operations on the list, interleaved with Gg evolutions. 

Operations are permitted to be adaptive in general, so that a unitary on the list 
might be a function of the result of a measurement previously listed. However, 
just as a quantum circuit usually begins by resetting all of its qubits to zero and 
then delays all other measurements to the end, so the spin chain computer could, in 
general, be expected to emulate such quantum circuits by resetting all of its logical 
qubits to zero at the beginning, and delaying all of its measurements to the end 
also. These are the kinds of emulations that we are interested in, and so we will 
proceed by showing that after resetting to zero and before final measurement, the 
list of operations considered need only contain non-adaptive unitaries, provided that 
the circuit being emulated is likewise constructed. 



4 e.g. X := J0><1| + |1)(0| + |2)(2|; Z Q := |0)(0| - |1)(1| + |2)(2|. 
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2.2.4 Emulating quantum circuits 

To initialise logical qubits of the spin chain to zero, simply apply clock-ticks until 
the qubit in question is revived at site 0, then reset it. Final measurements are 
rendered in likewise fashion. 

Similarly, it is trivial to emulate a single-qubit unitary on the spin-chain computer, 
because each logical qubit periodically revives to site where we can access it directly 
at the physical layer. 

To complete the emulation of a general circuit, we need only show how to imple- 
ment some non-trivial two-qubit unitary in a general position on the logical space. 
Unfortunately this will require (in the worst case) about 18n clock-ticks for nearest- 
neighbour gates, i.e. three cycles of the data, rather than just one, and even more 
clock-ticks for non-nearest neighbour gates. This is because the clock-ticks move the 
logical data in just one direction, whereas a non-trivial two-qubit unitary implicitly 
requires a bidirectional flow of data. (There is possibly some scope for making use of 
the unused n-qubits-worth of space in the spin system to circumvent this slowdown, 
but that is immaterial if we merely wish to show polynomial efficiency of simula- 
tion, and would presumably require a more complex encoding, and perhaps the use 
of partial measurements, &c.) 

Logical nearest-neighbour interactions 

In our first example of emulating a non-trivial two- (logical-) qubit gate, we shall 
use Gg up to about Yin time-steps, and also make use of the third energy level 
at site 0. Define Uq on the qutrit to be the unitary operator that exchanges the 
top two energy levels at site 0, viz |0)(0| + |1)(2| + |2)(1|. Because of the way that 
Eg has been defined at line (2.9), a single application of Uq effectively switches off 
the 'natural' interaction between sites and 1 during a clock-tick Gg. Thus, the 
following identities are immediate : 

Eg ■ (U ■ Eg ■ U ) = Ao(Zi) (2.14) 

= Gg • H U H • G g • U = Ai(Z ) ; 

Hq ■ Gg ■ HqUqHo ■ Gg ■ UqHq = ; 

t tt u \ r<t _ 1 + Z + Z XiZ 2 - XjZ 2 



Gg ■ ( Hq ■ Gg ■ HqUqHq ■ Gg ■ UqHq j • Gg 

Ug • Mq ■ Lrg • HqUqMq ■ Ug • UqHo ■ (jg 



2 
1 + Z + ZqX, - Xi 



This latter gate (on logical qubits and 1) is locally equivalent to a logical nearest- 
neighbour C-Not gate. A C-Not gate can be used three times, with appropriate 
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intervening single-logical-qubit unitaries, to emulate a swap gate (on the same logical 
qubits), and thence logical qubits can be swapped about as necessary to emulate 
non-nearest-neighbour two-logical-qubit unitaries. 

And so we are able to render efficiently a universal set of operations on the logical 
qubits of the system, using only Gg as a clock, together with access between clock- 
ticks to a qutrit at one end of a spin chain. Of course, blind substitution according 
to the description given here will likely lead to programs in this paradigm that 
could be otherwise compiled in a more optimal fashion, e.g. by taking advantage of 
opportunities to pack more than one simulated gate into each cycle of the data. 

Efficiency of emulation 

We end the section by considering the efficiency of the emulation described. To 
improve the simplicity of the reduction, we take A(Z) to be the principal two-qubit 
gate used in quantum circuit design, rather than the more usual choice of A.(X) 

(C-Not). 

Proposition 2.2.4. Let C be a quantum circuit on a line of w = 2n + 2 qubits, 
composed of nearest-neighbour A(Z) gates and single-qubit unitaries arranged into 
time-slices. Let d be the total number of time-slices in C , that is, the depth of C . 
The emulation of C on the spin-chain computer as described requires 0{w) space 
and 0(w ■ d) time. 

Proof. Set N = 2>n + 2 and work with a spin-chain processor of that size. In accor- 
dance with reasoning very similar to that used at line (2.14), make the abbreviations 

J[0..N-l] := Zq-Gq ■ Hq- Gg ■ HqUqHq 
K[0..N-l] := UqHq ■ Gg ■ H ; 

and then a logical A(Z) between qubits j — 1 and j may be rendered as 

G$ ■ H • G 6 g n+3 • J • G[f + 5 • K ■ Gf +& ~*\ 

which effectively involves three 'cycles' of the data structure. While these cycles 
are taking place for the emulation of some A(Z) in C, any number of single qubit 
unitaries from the same time-slice of C can be inserted at the appropriate point, and 
so contribute nothing to the overall cost of the emulation, as measured in clock-ticks. 
Moreover, other A(Z) gates from the same time-slice of C may also be inserted, for 
no additional cost : for example, to emulate A(Z) between qubits j — 1 and j and 
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also A(Z) between qubits k — 1 and k, when k — j > 2, we use 

H^i u n^-i) U riGn+3-3(k-j) T rt 3(k-j)-2 
L*g • -"0 • L*g ' ^0 ' ^g ' J ' ^g 

■J ■ Q &n+b -^ k ~i) . ft . Q3(k-j)-2 _ j^ _ g,6n+6-3fc 
(J (J (J ) 

which still involves only three cycles. (In fact, a closer inspection reveals that 
this method is perfectly valid for implementing overlapping nearest-neighbour A(Z) 
gates, e.g. k — j = 1, so it is possible to implement more of these kinds of gates in 
three cycles than can be implemented in one time-slice within the standard quantum 
circuit model.) Therefore, since w = In + 2, a single time-slice can be emulated on 
the spin-chain computer using ~ 3w/2 physical spins and ~ 3w clock-ticks, and all 
d time-slices are emulated in ~ 3w ■ d clock-ticks. □ 

Since this upper bound is polynomial, we declare the emulation to be efficient. 
Moreover, it is essentially optimal, due to the following lower bound. 

Proposition 2.2.5. Any emulation that seeks to encode a nearest-neighbour circuit 
C of width w and depth d obliviously into a list of operations to be performed on a 
constant-sized 'window' in some architecture must require the list in question to be 
at least Q(w • d) long in the worst case. 

Proof. An oblivious encoding must encode each gate of the circuit into the list. We 
assume as before that each gate of C is either a A.(Z) between neighbouring qubits or 
else a single qubit unitary drawn from a constant alphabet of single qubit unitaries. 
Then it takes £l(w) bits of information to describe each time-slice of C in the worst 
case, i.e. when C is densely packed with gates. Assuming for a moment that the 
elements on the list are to be drawn from a constant alphabet, so that at most 0(1) 
data can be fed 'through the window' each clock-tick, it will require Q(w) of them 
to represent the time-slice, and hence d ■ £l(w) to represent the entire circuit. 

If, however, the elements on the list are not drawn from a constant alphabet, but 
instead the size of the alphabet grows with w or d even though the size of the data 
structure at the 'window' remains constant, then the elements of the alphabet will 
tend to come arbitrarily close to one another, because the space of bounded operators 
on a finite dimensional Hilbert space is compact. This means that different patterns 
of elements cannot be obliviously simulating different circuits after all, so that the 
emulation strategy must break down after some finite point. Therefore this case 
need not be further analysed. □ 

Suppose now we drop the nearest-neighbour conditions. How do the upper and lower 
bounds change? 
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Proposition 2.2.6. Let C be a quantum circuit on a line of w qubits, composed of 
A(Z) gates (not necessarily nearest-neighbour) and single-qubit unitaries arranged 
into time-slices. Let d be the depth of C . The emulation of C on the spin- chain 
computer now requires 0{ufi • d) time. 

Proof. If we begin by transferring each A(Z) gate into its own time-slice, this adds a 
factor of O(w) to the time cost in the worst case, i.e. when time-slices tend to start 
with 0(w) two-qubit gates in them. Then each A(Z) can be unpacked in the usual 
fashion into a product of nearest-neighbour A(Z) gates interwoven with appropriate 
single qubit H gates. This unpacking increases depth by another factor of 0(w), 
and now the previous Proposition applies. □ 

Proposition 2.2.7. Any emulation that seeks to encode a circuit C of width w 
and depth d obliviously into a list of operations to be performed on a constant- 
sized 'window' in some architecture must require the list in question to be at least 
Q(w • d • log(iw)) long in the worst case. 

Proof. With the assumptions of before regarding use of constant alphabets and 
oblivious encodings, the amount of information contained in a time-slice of C must 
be Q(w • log(u>)) , because the first qubit could be involved in a gate with any of w — 1 
later qubits, then the next qubit with any of w — 3 later qubits, and so on. Since there 
are d time-slices, the total amount of information that needs to be passed 'through 
the window' is d ■ $l(w ■ log(w)) in the worst case, and an oblivious emulation — by 
definition — knows no way of improving upon this. Thus the list in the emulation 
must involve Q(w • log(w) • d) operations in the worst case. □ 

This gap between the upper and lower bounds in the latter case suggests that our 
upper bounds strategy may be naive, and not asymptotically optimal. 
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Chapter 3 

Probabilistic and Mixed 
Computing 



Non-determinism in a general sense refers to the idea that there might be more than 
one 'path' that a physical process can/does/might/could take : no unique path 
need be determined. In this chapter, we consider the role of non-determinism in 
computation, focussing on classical probability distributions, computations involving 
mixed states, and the non-operational concept of post-selection. Our main technical 
contribution (§3.3) is to show that in log-space (L) one can produce a quantum 
circuit that uses only one pure qubit and solves a 0L-complete problem, thereby 
generalising work of [8] . But we begin with a more abstract discussion of probability 
in computing to motivate the definitions used in §3.3, and also take the opportunity 
to introduce a new way of thinking about post-selection (§3.2) that will have some 
relevance in Chapter 5. 

3.1 Operational Approach to Probabilistic Computing 

This section just recalls some standard definitions and lemmata relevant to compu- 
tation with probability distributions, extending some of the discussion of Chapter 1. 
Definition 3.1.7 — for Bounded Probability decision languages with arbitrary post- 
processing — may be seen as an abstract generalisation of standard definitions for 
classes such as BPP, BQP, &c. 
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3.1.1 Elementary definitions 

Probability Distributions 

We use discrete probability distributions to model the classical data output of phys- 
ical processes that are designed for computation. A discrete probability distribution 
may be construed as a function, P, having countable domain, mapping to the inter- 
val [0,1]. It is stochastic, which simply means that the sum over the whole domain 
must converge to 1. For our purposes, it will be appropriate generally to take the 
domain to be the set of all finite binary strings, which is denoted {0, 1}*. We write 
P{C) as shorthand for Y. xe cndom(P) P ( x )- 

The direct product of distributions corresponds to the physical notion of running 
experiments independently in parallel and considering their combined output. 
Definition 3.1.1. If P and Q are two distributions, then 

P®Q : x,y h* P(x)-Q(y). 

Also, write P® k to denote the direct product of k copies of P. 

The standard way of describing the distance between two probability distributions 
is to use l p additive gaps : 

Definition 3.1.2. For p € [l,oo], the l p additive gap between distributions P and 
Q is given by 



\P-Q\\v ■= ^2\P(x) - Q(x)\i 



where the sum is taken over the union of the two domains. In the case p = oo, a 
limit is taken. 

The case p = 1 is called the statistical distance (or total variation distance, up to 
scaling). It has a special interpretation that makes it useful for defining the Bounded 
Probability decision classes. 

Operational nature of the statistical distance 

Here are some basic comments regarding the statistical distance : 

Proposition 3.1.3. Let D be the union of domains of P and Q, and let k be a 
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positive integer. 

l-HP-Qll! = max P(C)-Q{C). 

||p®fc_g®fc|| i < jfe-HP-QId. 

Proof. The proof of the first line is elementary from the definition. The second line 
follows from elementary induction together with the basic inequality 

\ac — bd\ = \ac — be + be — bd\ < \ac — bc\ + \bc — bd\ < \a — b\ + \c — d\ 

whenever a, b,c,d£ [0, 1]. □ 

In the theory of computation, we usually wish to post-process samples from a prob- 
ability distribution, in order to make a decision and complete a computation. To 
avoid encoding complexity in the post-processing phase, it is appropriate to use some 
simple structure, such as some decision language C in some 'simple' class (e.g. L), 
to compress a probability distribution down onto just two points : 
Definition 3.1.4. Let P be a probability distribution with domain D C {0, 1}*, and 
let C C {0, 1}* be some fixed decision language. Define the fully post-processed 
two-outcome distribution Pc : {T,_L} — > [0, 1] as follows. 

Pc(T) := P(DHC), 
P £ (±) := P(D\C). 

Proposition 3.1.5. Let Coin be an independent random coin. For any two-outcome 
distribution P, 

\\P-Coin\\i = |P(T)-P(J_)|. 

The value (P(T) — P(-L)) is called the bias of P; and so we see the magnitude of 
the bias of P is given by its statistical distance from a random coin. 

Proof. Simply consider |P(T) — ^\ + |-P(-L) — 2 1- D 

Putting these two ideas together, we immediately see that a non-negligible bias in P 
is necessary if we are to use a reasonable number of copies of P to magnify that bias 
to something substantial. (That is, if P is very close to a random coin, then P® 
will also be close to random.) It is a simple corollary of the Hoeffding inequality 
that a non-negligible bias is also sufficient for bias amplification, as shown in the 
following Lemma : 
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Lemma 3.1.6 (Chernoff/Hoeffding). If P is a two-outcome distribution with bias 
b, then one can straightforwardly post-process P® — using a majority vote, for odd 
k — to obtain a new distribution whose bias has the same sign as b and magnitude at 
least l-2e- fc l fe l 2 /2. 

Proof. The proof follows directly from Hoeffding's standard inequality [34] applied 
to Bernoulli trials. □ 

Families of distributions lead to decision languages 

When we come to consider not a single physical experiment but a whole family of 
them, we start to think about families of probability distributions also. 

The following definition gives a useful way of creating semantic decision languages 
directly from families of distributions, using the same 'operational' idea. Decision 
languages can thus be derived from a family of probability distributions directly, 
without reference back to the underlying machine or process that takes samples 
from the distributions. 

Definition 3.1.7. Let V = {Pi : i € 1} be a family of probability distributions, 
indexed by some totally ordered indexing set I. Let C be some fixed decision language. 
Let c be a constant in (0, «)■ For every index i € I, the value Pi(C) = P%cO~) ^ es 
in one of the three partitions [0, c], (c, 1 — c), or [1 — 5, 1]; and we can tri-partition 
the set of all indices accordingly. If the middle partition turns out to be empty, then 
we define the semantic decision language BPciV) to be the third partition : 

BP C (V) := {iel : P(C)>l-c}, 

which is in fact independent of c whenever defined. 

(Note that this language is a subset of /, therefore, it is appropriate in some circum- 
stances to take I to be {0, 1}*. But it is also often convenient to have it be N.) 

This definition can be used as an alternate way of constructing classes such as BPP 
and BQP. For example, a generic BPP decision language can be defined in the form 
BPc{V) by fixing some polynomial-time randomized Turing machine M and taking 
Pi to be the distribution of the output string of A4 on input the string i; while C 
could simply be the set of all strings that begin with a 1 . The computational power 
of polynomial-time Turing machines is sufficiently great that one need encode no 
'complexity' into C in order to have this BPc{V) be a 'powerful' class. The language 
C serves as a kind of post-processor for the probability distributions, and for those 
families of distributions that are significantly weaker than BPP ones, allowing some 
additional post-processing of a comparatively complex nature can perhaps provide 
a significant boost to the complexity of the ensuing language. 
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3.1.2 Philosophy of simulation 

This short section sets up some background context for the rest of the disserta- 
tion, which is about computing paradigms that are not necessarily universal for 
BQP. 

Let us look again at the constructions of the previous section. A family V of prob- 
ability distributions is more likely to be of general interest if there is (at least con- 
ceptually) some programme of physical experiments whereby "the ith experiment 
in the programme draws a sample from P" . And it will be of even greater inter- 
est if the resources required to implement the ith experiment scale efficiently in 
the complexity of (the size of) i. Designs for quantum computers that are severely 
limited — ones for which there is no apparent oblivious strategy for simulating arbi- 
trary quantum circuits — can be analysed by modelling their output as a family of 
probability distributions. 

This perspective leads to a nice way of thinking about simulation. Imagine two 
programmes of physical experiments, one (V) whereby the ith experiment draws 
samples from Pi, and one (Q) whereby the ith experiment draws samples from Qi. 
(If you care to, you might also suppose that our lab technicians assure us that neither 
of these programmes uses very much more equipment or time than the other.) Under 
what circumstances can we reasonably say that the two programmes simulate one 
another? If P{ = Qi for all i, then the simulation is exact. When simulation is not 
exact, we need to quantify "how V is unlike Q". To quantify the difference between 
Pi and Qi, one could use one of the l p measures of additive gap (Def. 3.1.2), and we 
have already seen that the statistical distance (l±) is the most operationally relevant. 
Then one must choose whether to be concerned with the worst case for i, conisdering 
maxj \\Pi — Qi\\; or some kind of asymptotic case, limsup^^ \\Pi — Qi\\ perhaps; or 
else some kind of average case measure. 

The following Proposition about asymptotic similarity relates the language definition 
back to the notion of statistical distance. 

Proposition 3.1.8. If £ is a decision language, and V and Q are two families of 
probability distributions for which both BPciV) and BPc{Q) are defined, then the 
following implication is valid. 

lim ||i'i-Qi||i = => BP C (V) ~ BP C (Q), 

i— >oo 

where the relation ~ denotes set equality up to finite difference. 

Proof. Let c p and c q be the two constants in (0, ^) used in the definitions of BPc(V) 
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and BPc(Q) respectively. Then 

lim ||Pi-Qi||i = => maxPi(S)-Qi{S)^0 

i— >oo S 

=> \Pi{C) - Qi{C)\ =: twO. 

Now if i G BP C (Q)\BP C (V) then P(£) < c p and Qj(£) > 1 - c ? . This means that 
J"» > 1 — c q — c p > 0. But if ri is to tend to zero, then it can only take values above 
the positive constant 1 — c q — c p finitely often, and so the symmetric difference of 
the two languages must be finite. □ 



3.2 Post-selection 

This section discusses the idea of post-selection, which is a non-operational concept. 
It is the idea that when making experiments of a probabilistic nature, one might 
focus on those instances whose outcomes satisfy a certain condition, and then analyse 
the outcomes as though those were the only instances. Of course, the post-selection 
condition may itself be exceptionally rare, which is why the corresponding decision 
languages tend to be very large and 'non-operational' in nature. 

We consider that post-selection is a useful conceptual tool to have in mind when look- 
ing at paradigms for quantum computation, and also potentially for proving results 
about classical compexity classes (cf. [1]). Notation is introduced here, mirroring 
§3.1 as much as possible, but no use is made of these ideas until Chapter 5. 

Following what we did in §3.1, we begin with a definition analogous to the statis- 
tical distance, but for post-selective concepts. Then we give a definition that com- 
presses probability distributions down to two-point distributions, this time using 
post-selective post-processing rather than ordinary post-processing. We proceed by 
considering again families of distributions, and we discuss what these ideas might 
mean for simulation. Definition 3.2.5 — for Post-selected decision languages with 
arbitrary post-processing — may be seen as an abstract generalisation of standard 
definitions for classes such as BPP pai /j, PostBQP, &c. 

Non-standard distance measures 

The following non-standard measure 1 for gaps between probability distributions is 
offered as a candidate for the analogue of the statistical distance in a post-selective 
context, justified by its use in Proposition 3.2.6. 

Definition 3.2.1 (Cf. Definition 3.1.2). For p E [l,oo], the L multiplicative gap 
between distributions P and Q is infinite if P and Q have different support; otherwise 



It is defined similarly to the Renyi Information Divergence, but with important differences. 
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it is given by 

\\P/Q\\p ■= ( ^|logP(x)-logQ(x)|M P , 

where the sum is taken over the (mutual) support. In the case p = oo, a limit is 
taken. 

Like the additive gap measures defined earlier, these multiplicative gap measures 
are symmetric, in that ||P/Q|| p = ||Q/P|| P . 

The post-selective distance 

Here is an elementary remark on the case p = oo, which we henceforth dub the 
post-selective distance : 

Proposition 3.2.2 [Cf. Propos. 3.1.3). Let P and Q be two distributions with the 
same support D, and let k be a positive integer. 



HP/QHoc = max 



, p{£) 

log 



Q(C) 

\\P® k /Q m \\oc fc-lliVQIIoo. 

Proof. In the limit p — > oo, the definition immediately tells us that ||P/<3|| C 
max x |logg^|. But log gg 



Q(x) 



Q(C) 



is maximal when C contains only the singleton x that 
maximises this expression. The second line also follows from the same observation. 

□ 



Post-selection 

As before, we wish to compress a probability distribution down onto two points, 
so as to imply a decision. But this time, we condition on some specific type of 
outcome before taking that decision. So it is necessary to use a nested pair of 
decision languages £5 C Cc to compress down onto two points, as follows : 
Definition 3.2.3 [Cf. Definition 3.1.4). Let P be a probability distribution with 
support D C {0, 1}*, and let C$ Q C-c ^ {0, 1}* be some fixed nested pair of de- 
cision languages. If P{Cq) / then define the fully post-selected renormalised 
two-outcome probability distribution Pc s cc c '■ {~T> -L} — > [0, 1] as follows. 



PCsCCciV 

PcsCCc(l-) 



P(AcV 

P(£c\£ s ) 
P(C 



c 
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This interpretation of ratios of probabilities as conditionals is due to Bayes's Theo- 
rem. 

As before, the bias of P can be defined as P(T) — -P(-L). 

Note that it is possible to amplify the bias of Pc s cc c by taking its fc-fold product 
and appealing to the majority-vote method of Lemma 3.1.6. In fact, we can do 
slightly better in the post-selection case, to obtain a better amplification rate. 

Lemma 3.2.4 (Cf. Lemma 3.1.6). If P is some probability distribution, and Cs C 
Cq are languages, and Pc s cc c has bias b, then for any positive integer k of our 
choosing, we could take Q = P® k and take modified languages C' s C C' c such that 



the bias of Q C r C £/ has the same sign as b and has magnitude at least 1 — e 



-k\b\ 



Proof To prove this, take C' s = (C s )® k and C' c = (£ c \£s)® k U (£ S )® h - Plugging 
these into the definition, the new bias is seen to be 

P(T) k - P(±) k (l + b )fc-(l-b)fc 

P(T) k + P(±) k (l + b) k + (l-b) k ~ { ' ; 

Now for < b < 1, we need only show 



kb - ■ . / 1 + & 



k 



2e ^ 1+ {l^b) ; (3 ' 2) 

the case — 1 < b < is symmetrically the same. 

Line (3.2) can be established analytically. It clearly holds at b = 0. Then the 
first derivative of the left side is Ike , while the first derivative of the right side is 
i~b (i^s) fc > so ^ suffices to show (for positive b up to 1) 



This is easily seen for k = 0, and for other (positive) values of k it suffices if 

* <- ^ («» 

This last line follows immediately (term by term) from the power series expansions. 

□ 



Families of distributions, post-selected 

The following definition is for post-selective classes of decision languages. 
Definition 3.2.5 (Cf. Definition 3.1.7). Let V = {Pi : i G 1} be a family of 
probability distributions, indexed by some totally ordered indexing set I. Let Cs C Cc 
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be a nested pair of decision languages. Let c be a constant in (0, |). For every index 
i for which Pi{Cc) ^ 0, the ratio P{(Cs)/ 'Pi(Cc) lies in one of the three partitions 
[0, c], (c, 1 — c), or [1 — c, 1]; and we can tri-partition the set of all such indices 
accordingly. If this works for all i, and the middle partition turns out to be empty, 
then we define the semantic decision class Postc s cc c ('P) t° be the third partition : 

Post Cs cc c (V) := {iel : Pi(C s ) > (I - c) ■ Pi(£ c ) h 

which is in fact independent of c whenever defined. 

Although not operationally relevant, this definition is just as general as the earlier 
one for bounded probability, again making no reference to the origin or complexity 
of the distributions in question. 

This definition can be used as an alternate way of constructing classes such as 
~B~PP P ath or PostBQP. For example, a generic BPP pa t/j decision language can 
be defined in the form PostcgCCcO^) by fixing some polynomial-time randomized 
Turing machine M and taking Pi to be the distribution of the output string of A4 on 
input the string i; while £5 and Cc could simply be the sets of all strings beginning 
"11..." and "1..." respectively. 

Non-operational simulation 

It will not have escaped the reader's notice that we have tried to make our post- 
selective constructions and discussions of section 3.2 follow a parallel course to the 
constructions and discussions of section 3.1. And so it remains to prove one more 
analogous Proposition. 

Proposition 3.2.6 {Cf. Propos. 3.1.8). If Cs C Cc are decision languages, andV 
and Q are two families of probability distribution for which both Postc s cc c : ') an & 
Postc s <zc c (Q) are defined, then the following implication is valid. 

lim \\P i /Q l \\ O0 = => Post Cs cCcCP) ~ Post Cs cCc(Q), 
i— >oo 

where the relation ~ denotes set equality up to finite difference. 

Proof. Let c p and c q be the two constants in (0, |) used in the definitions of the post- 
selective languages Post{V) and Post(Q) respectively. Since max£ 



log^ 



Qr{C) -r ° 

by Proposition 3.2.2, it follows that there must be some real sequence rj, tending to 
1 from above, such that the values n ir S ) anc ^ o (c\ both lie in [1/Vi, n]. Therefore, 
every time i G Post(Q)\Post(V), it follows that 

r 2 > Qi(Cs) Pi(Cc) > 1-g, 



Qi(Cc) P l {d 
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This means that n > ^/(l — c q )/c p > 1, which can occur only finitely often since 
rj —7- 1. Likewise for the case i £ Post(V)\Post(Q), and so the symmetric difference 
of the two languages must be finite. □ 

Synopsis 

The key idea for our understanding of post-selection concepts is that whenever an 
operational paradigm for quantum computing is proposed, it may be a hard or 
impossible task to show that it is universal for BQP, but it may be much easier to 
show that its natural post-selective variant (with suitable limitations on £$ and Cc) 
is universal for PP (Aaronson [1] has shown this to be equal with PostBQP). Since 
there is no known way to establish the equivalence of BPP pa t/i with PP, any such 
demonstration of post-selective PP-universality is tantamount to a proof that there 
will be no oblivious classical simulation strategy for the operational version of the 
paradigm. This means that one can argue for 'genuinely quantum computational 
effects' or 'intractibility of simulation' without needing a full-blown BQP-powerful 
architecture. We will put this into practice in Chapter 5. 

3.3 Computing with Mixed States 

In this section, we consider formalising the notion of "computing with just one pure 
qubit"; a paradigm that was introduced in [41], and further investigated in [8]. In 
this paradigm, arbitrary quantum circuits are allowed, but the inupt to the quantum 
circuit must have very limited purity. Moreover, the input is fixed, and so cannot 
be used to index the elements of a decision language. We shall explore what can 
be done with uniform families of quantum circuits in this paradigm, introduce a 
particular model 2 and class of decision languages, and argue for why the model is 
aptly described using the terminology of the present Chapter. 

3.3.1 Overview 

In [41], Knill and Laflamme considered an extreme limitation on state purity by 
asking for computations that have only one pure qubit, the other qubits being fully 
depolarised. This paradigm they called "DQC1". 3 They asked about what can 
be computed if one allows arbitrary quantum circuits of polynomial length, taking 



2 I posted research notes on this subject on the arXiv in 2006, but didn't pursue publication of 
them at the time. 

3 The 'D' here stands for 'deterministic', which is being used to mean what I have elsewhere 
termed 'operational'. 
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input as described above, and measuring a single qubit to obtain an output of the 
computation. 

It is convenient to denote mixed states using the density operator formalism. A den- 
sity operator is essentially 4 the quantum generalisation of a probability distribution. 
This operator encodes all the information about the propensity for how a state will 
behave in relation to all possible measurements. (With respect to a computational 
basis, the diagonals of a matrix representation of such an operator correspond to an 
actual probability distribution.) Accordingly, using a discrete TIME model and a 
finite-dimensional unitary space, one can take state space for mixed quantum com- 
putation as the convex hull (in the space of linear functions) of rank-1 Hermitian 
projectors on the unitary space, rather than the unitary space itself. 

After outlining prior work on this subject, we shall introduce a more formal way 
of expressing operationally relevant decision languages that naturally belong to this 
paradigm, and show that the "one pure qubit" analogue of BQP contains the class 
©L. We shall also provide oracle separations, both ways (one of which is new), 
between this class and P. 

3.3.2 Prior work 

The physical motivation for the DQC1 paradigm comes from architectures based on 
Nuclear Magnetic Resonance (NMR), where purity of quantum state is hard come 
by. For NMR computing, the mixed state that one is forced to work with has its 
mixedness spread across all qubits, so that they are initialised in a 'hot' state of the 
form 

^|0)(0| + i^|l)(l| • (3.5) 



But using an analogy from thermodynamics, in [56] it is shown how to build an 
efficient unitary circuit that 'distils' out purity with high probability, leaving a 
state that is close to |0) (0| on the first n — n- H(e) — o(l) qubits. (Here H measures 
the entropy of the state, and so the limit is effectively tight.) And so (cf. [8]), 
provided such transformations are reasonable within one's computational model, it 
is no loss of generality to restrict one's attention to the more 'digital' perspective 
whereby the initialisation state (as a density operator in an algebra of w qubits) is 
taken to be 

(1 + Zf k 
Pstartl^) = 2^ ' ( ' 

4 Under the Everettian interpretation, density operators are taken to describe objective states 
rather than subjective states of knowledge, but just as the nature of probability is philosophically 
ambiguous, the same can be said for density operators. 
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where w counts the total number of qubits and k < w counts the number which are 
pure, with the standard Pauli operators being used to describe density operators. 
(For subscript notation throughout this section, we take qubits [l..k] to be the ones 
initially pure, and [k + l..w] to be the ones initially depolarised.) 

If one begins with state /o s tart(l' w )> applies an arbitrary unitray map W|i.. w ] across 
all qubits, and then measures the first qubit in the computational basis, one obtains 
|0) with bias 

Tr[ (W {l .. w] ■ p s tart(l^) • ^[L]) ■ Z 1 ] (3.7) 

= 2- w -Tr[W [lM -Z l -Wl wV Z l \. 

Knill and Laflamme showed that this paradigm — even with just the one pure qubit — 
can be used to estimate the trace of a unitary operator, as outlined below. Moreover, 
there is a sense in which the problem of trace estimation is complete for the paradigm 
[65]. Given a circuit for an arbitrary unitary V on w — 1 qubits, the unitary used to 
estimate the (real part of the) trace of V is taken to be W[i., w i = H\ ■ h\ ( Vfr w \ ) ■ Hi, 
because then the measurement bias is 

2~ w .Tr[W [ i.. w] -Zi.w} lAu] -Z 1 ] = 2 1 - w -Tr[Re[V]]. (3.8) 

Such biases can be amplified in the usual fashion, by parallel instantiation and 
majority vote (c/. Lemma 3.1.6). 

Ambainis, Schulman, and Vazirani [8] showed that the non- uniform version of the 
complexity class NC (classical, polynomial time, logarithmic circuit depth) is com- 
putable within a non-uniform model of the "one pure qubit" DQC1 paradigm, and 
also showed that there is no obvious efficient way (i.e. no oblivious technique) to 
simulate a circuit with k pure qubits using fewer pure qubits, except at the cost of 
exponentially decaying efficiency. 

Shor and Jordan [65] discussed the differences between considering quantum cir- 
cuits supplied by a polynomial-time classical computer and a more restricted clas- 
sical computer computing only NC . They also showed that, even in the weaker 
model, having logarithmically many pure qubits is no better than having just one, 
provided it is understood that one is free to make polynomially many runs of DQC1- 
type experiments, with majority-vote post-processing, in order to make any specific 
decision. 



ON 



3.3.3 Decision languages for mixed states 

What makes models within the DQC1 paradigm a little different from the usual 
notion of a computation model? 

• One is not permitted to make intermediate measurements (or other non- 
unitary gates) during the execution of a circuit, since otherwise such operations 
could be used to introduce new purity into the system, effectively boosting its 
power back to that of universal BQP computing (c/. §4.1.4). 

• One cannot define decision languages in terms of the input into a unitary 
circuit in this model, because the quantum input is always constrained to be 
the one given at line (3.6). We shall see that this means that classical input 
must be interfaced via classical control of the circuit elements. 

• The computational power of the model is potentially affected by how many 
bits can be interfaced out of the computation at measurement time : e.g. 1, 
k, or vol 

The three points above must be addressed properly if one is to use the paradigm to 
define formally a class of decision languages. But before we attempt such a definition 
(Def. 3.3.3), let us first give an algorithm for a ©L-complete decision language : the 
task of evaluating one output bit of a polynomial-sized classical circuit composed 
entirely of C-Not gates. (The decision language corresponding to this class is not 
believed to be in NC , despite the fact that matrix multiplication over the field F2 
can be computed in logarithmic circuit depth [20, 17].) 

Let {Cj} be a family of classical circuits composed entirely of C-Not gates, such 
that the number of bits input to Ci is equal to i. Let C be the language of strings x 
which, when input to the appropriate Cj, cause the first output bit to be 1 : 

C = { x e {0, 1}* : i = len(x), first bit of d(x) = 1 }. (3.9) 

Now let x be some particular input string, and let i = len(x). Let the i bits of 
the string x be denoted X2, X3, . . . , Xi+%. Let S2, S3, • • ■ , Sj+i be the output bits of 
Ci(x), so that S2 is the bit whose setting decides whether x £ C Suppose we wish to 
determine whether or not x £ C, using some DQCl-style computation. To do this, 
we must specify a quantum circuit, denoted W(x), designed in some appropriately 
uniform manner (relative to the uniformity of the family {Ci}, see below) that will 
be used to compute the value S2- 

Let w = i + 1 measure the total number of qubits on which our circuit will act, so 
that it makes sense to apply our circuit W(x) to the state /0 s tart(l> w ) m accordance 
with line (3.7). 
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Next, let Vi(x) be the circuit on qubits [l..w] that consists of one C-Not gate from 
qubit 1 to qubit j each time that bit Xj is set. This we notate 

w 
Vi(x) [lM := J] AiiXjfi. (3.10) 

i=2 

Informally, we say that this circuit will be used to interface the information contained 
within the string x to the 'DQC1 algorithm' that we are designing. More formally, 
the incorporation of Vi(x) as a 'subroutine' within W(x) is to be the only way in 
which W(x) depends on x, so that a sensible notion of uniformity applies to the 
family {W(x)}. 

Let Qb denote the Hadamard gate being applied to every qubit (see §4.1.1 for an 
explanation of this notation) . Let d be recast as a quantum circuit to be applied 
on qubits [2..u>]. Finally, let U{x) := Qb • C{ ■ V{(x) • Qb, and define W(x) to be the 
overall circuit given by 

W{x) [lM := U(x)\ lM -X 2 .U{x) [lM . (3.11) 

Lemma 3.3.1. When the circuit W(x) defined above is applied to the state p$tart0-i w )> 
the bias as specified at line (3.7) is always either 1 or -1, and is -1 exactly when 
x £ C as defined at line (3.9). 

Proof. We claim that the effect of U(x) = U(x)h w ] on P s tart(^'' li; ) * s ^° ma P ^ to 

U(x) [Uw] ■ -^- ■ U(x)j Uw] = ^ . (3.12) 

Perhaps the easiest way to see why this claim holds is to regard /O s tart(l> w ) as 
being the proper uniform mix of pure states |0) 1 |r)r 2 w u where r ranges over all 
i-bit strings. Write R for Cj(r). Then the effect of U(x) on |0) \r) is readily seen to 
follow from 



Vi(x)-Q B \0)\r) = 2-/ 2 ^(-ir^|0)|y) + |l)|ye 2 ;) 

y 

Ci-ViW-QB^lr) = 2-/ 2 ^(-lH(|0)|C l (y)) + |l)|Q(ye 3 ;))^ 

y 

= 2-™/ 2 J2(-lf i{r>y (\0)\y) + \l)\y®C i (x))), (3.13) 

y 

U(x)\0)\r) oc ((l + (-l) R - o ^)\0) + (l-(-l) R - Ci ^)\l)^\R). 

The proper mix (over r) of these states must conform to the claim of line (3.12), 
because application of X\ causes each to map to something orthogonal, as does 
application of Xj exactly when Sj is set; whereas application of Z\ or Zj causes no 
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physical change. 

Then we see (from line (3.11)) that the action of W(x) involves 'computing' the Sj 
bits in the sense of line (3.12) by applying U(x), then 'kicking' the value of S2 into 
the internal phase of the state by applying X2, then finally 'uncomputing' the Sj 
bits by applying U(xy , so that we are left with 

W-^^W - i±^. (3.14) 

The bias for this state (cf. line (3.7)) is (— 1) S2 , as required. □ 

The processing involved in the construction of Lemma 3.3.1 is achieved efficiently 
and deterministically (the final measurement returning a classical deterministic bit), 
using just one pure qubit, but the circuit W(x) that provided the processing of data 
within the quantum memory required to incorporate two copies of Vi(x). That is, 
the algorithm required the ability to 'read' the input bit-string twice, each time 
reading its bits in arbitrary order. 

3.3.4 BQ[A;]P and parity-control 

Here we offer a definition for a class of decision languages, based on the ideas used 
within the construction of Lemma 3.3.1, but generalised to allow for computations 
that are not deterministic. 

Besides the parameters k and w for determining the initial quantum state, we also 
need a parameter i to determine the length of the classical string that will be used to 
control some of the gates within the circuit, which is the same string that the circuit 
is effectively 'deciding' on. And we need another parameter b that describes the 
magnitude of the bias that the circuit must produce for all valid inputs, since very 
tiny biases are not to be considered operationally significant (cf. §3.1). Parameters 
k,w,b will all be taken to be functions of the argument i. 

Finally, we need a sensible mechanism for describing how the bits of the classical 
input string x will control the circuit's gates. It seems appropriate to adopt parity- 
control, which means that if a gate is subject to classical control (e.g. just as the 
gates at line (3.10) depend on classical bits from x), it will be controlled by an 
F2-linear function of the input x. 

Definition 3.3.2. A gate U is said to be under parity-control from the input string 
x according to the control specification string c if the gate is applied (in its turn) 
when the circuit is executed if and only if the derived bit c ■ x should be set. This 
parity-controlled gate is denoted U c ' x . 

That is, a gate from a quantum circuit may have included within its description 
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an arbitrary but explicit control specification string c, to describe how the string x 
should affect whether or not the gate is to be applied. The length of the control 
specification string c should obviously match the length of the input string x, which 
is i. This device is a generalisation of the classical notion of a sequential branching 
program studied in [11] for example, although less directly related to the graphical 
Q-branching programs discussed in [20]. 

Note that when the number of pure qubits is not limited, there is a trivial reduction 
from 'ordinary' quantum circuits (with classical input directly made quantum in the 
computational basis) to parity-controlled circuits with 'null' quantum input |0) : viz, 
the first thing the parity-controlled circuit would do to simulate the ordinary circuit 
is to implement X on qubit i, controlled by the parity of the single input bit x% 
(assuming of course that X is amongst the allowable quantum gates) . Having done 
this, the rest of the simulating circuit would just proceed with the simulated circuit 
'uncontrolled' by classical input bits. And so parity-controlled circuits can usefully 
be standardised in paradigms other than DQC1, particularly appropriate whenever 
one has no need for the concepts of circuit composition and quantum communication, 
or no notion of preprocessing classical data before forming quantum data from it {i.e. 
quantum input/output). 

Here then is a definition for a DQCl-style complexity class, informed by the discus- 
sion above and by Definition 3.1.7 of §3.1. 

Definition 3.3.3. Consider a uniform family of quantum circuits {W(i)}^ =1 , some 
of whose gates may be under parity-control. Let k = k(i) < w = w(i) be a pair of 
polynomially bounded complexity functions, with w(i) counting the width ofW(i). 
Let < b = b(i) = 0,(1/ poly (%)) < 1 be another function. Then partition up the set 
of all x G {0, 1}* each according to which of the three sets 

[-1,-6], (-b,b), [6,1] 

contains the bias 

Tr[ W(t)[i.. w ] ■ p s tart( k i w ) ' W(*){i.. w ] • Z\ } , 

where the argument i = len{x) is used throughout. Lf the middle partition turns 
out to be empty (no string x causes a negligible bias), then we define the semantic 
decision language Ck t w,w,b to be the third partition : 

Ck,W,w,b ■= {x£{0,l}* : i = len(x), Tr[W{i) ■ p start (k,w) -W{rf ■ Z x ] > b }. 

The class BQ[A;]P contains all such £>k,w,w,b for that value of k. (The union of all 
these classes is clearly BQPJ 
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This definition is based on Definition 3.1.7, but an important difference is that 
since the one-pure-qubit model has no apparent way to amplify bias within the 
quantum part of computation, we instead allow for polynomially small bias rather 
than constant bias. 

Here is the main result of this section : 
Corollary 3.3.4. ®L C BQ[1]P. 

Proof. Definition 3.3.3 clearly allows scope for our algorithm of Lemma 3.3.1 to 
ensure that a ©L-complete language is contained within BQ[1]P. 

Moreover, the result of Shor and Jordan [65] about the utility of logarithmically 
many qubits not exceeding that of a single qubit likewise holds under this definition, 
with essentially no modification to their proof, so that BQ[Iog]P = BQ[1]P. 

But it is trivial that L C BQ[Zog]P, completing the argument. □ 

One may think of the structure {BQ[/c]P}fc as forming a hierarchy that reaches from 
the simplest model of the paradigm (one pure qubit) up to full BQP universality 
(BQ[po/y]P). In [8], it is shown that an 'oblivious' simulation of a program in this 
hierarchy by a program much lower in the hierarchy is impossible; but now we see 
that a formal unconditional proof of this hierarchy's not collapsing would constitute 
an unconditional separation between BQ[1]P and BQP, and thence also imply an 
unconditional separation between (say) ®L and PP by Corollary 3.3.4 (c/. [1], and 
also §1.4.1). 

3.3.5 Oracle separations for BQ[1]P 

One can use the notion of an oracle (§1.3.2) to make a formal relativised separation 
between complexity classes. In this context, an oracle would take the form of a 
(non-uniform) family of permutations on the set F™, supplied as so-called "black- 
box unitaries" or classically as "black-box functions" . 

Proposition 3.3.5. There is a "black-box" oracle O for which P° % BQfljP . 

Proof. An example is given in [41], showing implicitly why certain 'classically easy' 
facts about an oracle cannot be learned using only DQC1 methodology. The same 
proof works for this Proposition, with only very minor changes. □ 

Simon's algorithm [66] provides an oracle for establishing a separation of the form 
BQP ^ BPP . With a small change, the same kind of oracle establishes the 
converse to Proposition 3.3.5, as follows. 

Proposition 3.3.6. There is a "black-box" oracle O for which BQ[1]P % P°. 
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Proof. The oracle in Simon's algorithm [66] is based on randomly selected 'hidden 
shift' functions, f n : Fj — > F™, with /(x) = /(z) 44> x + z € {0, s}, for some random 
non-zero vector s. Instead, generalise this to have f n be a random function that is 
constant on cosets of some large subspace S n < F%. Further generalise by taking a 
similar function g n as a random function that is constant on cosets of some other 
large subspace T n . We shall also need that these functions are distinct on distinct 
cosets. 

The oracle is considered to provide a family of such functions in the usual fashion, 
parameterised by n. We can let m be any polynomial function of n, so we'll pick 
m = m{n) = 2n for a concrete example. 

Using these random functions, define the following permutation-unitaries on w = 
w ( n ) = n + 2m(n) qubits (where |a,b,c) = |a) [L . n] |b) [n+Ln+m] \c) [n+m+Un+2m] ) : 

U[i.. w ] : |a,b,c) h> |a,b + / n (a),c), 
u {i..w] : l a > b > c ) ^ |a,b,c + 5f n (a)), 
V[i.. w ] := [H^ n n] ■ U {1 „ w] ■ H® n n] ■ U' {1 w] J . (3.15) 

Now let's evaluate the trace of V : 

2- w -Tr[V] = 2- wS ^(?i 1 h,c\H® n UH® n U'H® n UH® n U'\si,h 1 c) (3.16) 



abc 

-w—2n 



y-^ (b + / n (x),c + 5n (y)| 
abtT yz (-l)(*+y)<*+*) |b + /„(«), c + ft,(a)) . 



The only terms here that won't vanish are those whereby (x + z) G S n and (a + y) G 
T n , using the fact that functions f n and g n are distinct on different cosets of S n and 
T n respectively, but otherwise constant. So we make a change of variables, s = x + z 
and t = a + y. Then 

2~ w ■ Tr[V] = 2~ 3n - 2m V V (-1) 8 '* 

bcxys£5„, t£T n 

= 2~ n Yl (-i) 8 '*- ( 3 - 17 ) 

If we are careful to ensure that the dimension of S n matches the codimension of T n , 
so that \S n \ ■ \T n \ = 2 n , then this expression further simplifies to 

2- w -Tr[V] = { l lf Sn±Tn . (3.18) 

\ if S n J. T n 

One can use the trace-estimation algorithm (§3.3.2) to distinguish these two cases. 
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Since the trace-estimation algorithm requires implementing A(V) once, and since 
each implementation of V makes use of four oracle calls, it follows that four oracle 
calls are sufficient for distinguishing between the two cases of "orthogonal cosets" 
versus "non-orthogonal cosets". 

This quantum black box algorithm therefore solves a certain promise-problem, but 
can the same problem be solved classically efficiently? No, because in the worst 
case, exponentially many samples of f n and g n are needed. If the dimension and 
codimension of each of S n and T n is ^, then the domains of each of f n and g n 
partition into 2 n ' 2 different cosets, on which different values are taken. There need 
be no other structure in /„ and g n , and so there is no efficient way even to find 
an element of S n or T n . We formalise this idea next by showing that if a classical 
algorithm were to sample each of f n and g n at any 2 n > 4 points each, then it would be 
possible that no two samples of f n were found to be the same and neither were two 
samples of g n the same, and moreover there would exist a consistent choice of S n and 
T n with S n A.T n as well as a different consistent choice with S n ^LT n . Therefore the 
algorithm would fail; which establishes a classical (deterministic worst case) lower 
bound of 2 n > 4 queries required. 

Suppose 2 n ' 4 queries are made of f n . That amounts to 2 ra / 4_1 (2 n ' 4 — 1) pairs of 
(unequal) points sampled, and the two samples of any pair being different is the 
same thing as the (non-zero) sum of those two points lying outside S n . Now the 
number of non-zero points in Fr? is plainly 2 n — 1, and the number of non-zero 
points in any candidate subspace S n of dimension ^ is 2 n ' 2 — 1. Therefore any 
point being declared to lie outside of S n denies a proportion (2 n ' 2 + 1) _1 of the 
possibilities for S n . (Think of a bipartite graph between non-zero points of F2 and 
subspaces of dimension n/2.) Therefore our samples — if they do all turn out to 
be distinct — must certainly preclude fewer than half of all candidate S n subspaces, 
since 2 n / 4 - 1 (2™/ 4 - 1) • (2 n / 2 + l)^ 1 < \. The same reasoning applies to T n . 

To each candidate S n there is precisely one T n (namely its dual) for which S n A.T n 
(and plenty of other T n for which S n ^.T n ). Since more than half of all possible S n and 
T n remain as candidates, it must be possible to find a pair such that S n A.T n , as well 
as a pair for which S n ^LT n . Since both possibilities are available, no deterministic 
algorithm having made 2 n ' 4 queries can possibly solve the problem in the worst 
case. □ 

3.3.6 Probabilistic quantum polytime, PQ[A;]P 

For completeness, we can also define syntactic classes PQ[fc]P in an analogous fash- 
ion, by dropping the requirement that the bias be non-negligible. 
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Definition 3.3.7. In the terminology of Def. 3.3.3, 

4,H> := {xe{0,l}* •■ i = len(x),Tr[W(i)-p start (k,w)-W(i?.Z 1 }>0}. 

The syntactic class PQ[/c]P contains all such C kWw for that value of k. 

Relaxing the probability bounds in this manner results in far greater computational 

power. 

Proposition 3.3.8. The classes PQ[fc]P are all equal to PP, for all polynomially 

bounded k > 1. 

Proof. PQ[1]P C PQ[A;]P C PP follows directly from standard results (cf. [4]), 
so it suffices to show that PP C PQ[1]P. To see this, we simply apply the trace 
estimation algorithm of Knill and Laflamme [41] to the unitary that defines an 
arbitrary efficiently computable Boolean function. 

Let / : Fg — > F 2 be a function computable in classical polynomial time, let Vp..^ = 
J2 x (-l) f{x) \x){x\, and let W {1 ^ w] = H x ■ Ai(Vj 2 .. w ]) • H 1 . Then apply W to the state 
/ 9 start(^' u; ) where w is the width of the circuit that implements W. When the 
first qubit is measured in the computational basis, it will be |1) with probability 
2 -n • #{ x : f(x) = 1 }, as required for PP. □ 
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Chapter 4 



The Fourier Hierarchy 



Classical computation permutes a discrete set of states (c/. §1.1.2), whereas quan- 
tum computation (despite the name) allows for a more continuous notion of state 
evolution. Therefore perhaps one can make quantum computation 'seem' like more 
of a natural extension of its classical counterpart by limiting to gates of a discrete 
group. This Chapter is concerned with the study of groups of transformations 
that fix the computational basis, e.g. the group generated by gates from the set 
{X, A(X),A 2 (X)}. There are several different ways in which one can think of com- 
bining reversible circuits built from basis-preserving gates of this kind. For example, 
one might take the output of one such circuit, rotate each qubit in some prescribed 
fashion, and input this to the next circuit for further processing. (We call this 
quantum adaption, because the data being passed from one circuit to the next — 
determining the next phase of computation — is entirely quantum.) This idea leads 
to the Fourier hierarchy of quantum complexity classes, introduced by Shi in [63]. It 
provides us with a measure of quantum computation complexity that has to do with 
the branching and recombination of computational paths from the perspective of 
a canonical computational basis, and therefore allows (loosely speaking) for a kind 
of comparison with classical complexity that appeals to a classical-centric way of 
thinking. By interleaving 'classical' circuits with quantum basis-changes, resource 
requirements for a quantum computer (running with, say, polynomial spatial and 
temporal resources) can be quantified with more granularity : by asking about both 
the complexity of the 'classical' (non-branching) parts and also by counting the 
number of basis-changes employed. 

A more limited way of interfacing such circuits together would be to measure the 
output of one circuit in some pre-specified basis, and then use the resulting classical 
data as classical control on the gates of the next circuit, whose quantum input should 
be 'trivial' in some appropriate sense. (We call this classical adaption, because the 
data being passed from one circuit to the next is entirely classical.) This idea 
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leads to the definition of a Fourier Sampling Oracle, as discussed in [14]. Such a 
computing paradigm acquires its power from the fact that the quantum states input 
to a circuit — as well as the basis in which output measurements are taken — can be 
different from the computational basis. 

Kitaev showed [39] that the 'core part' of Shor's algorithm [64] need not be ex- 
pressed in terms of some Fourier transform directly related to the group being stud- 
ied; rather, he developed the technique of eigenvalue estimation to solve the Abelian 
Stabiliser Problem, which generalises many of the problems that can be solved using 
Fourier techniques. This means that the family of problems that seem to depend on 
Fourier techniques for their efficient solution (such as integer factorisation, compu- 
tation of discrete logarithms, the abelian hidden subgroup problem, solving Pell's 
equation, and so on [36, 32, 33]), can be rendered efficiently without recourse to 
'complicated' Quantum Fourier Transforms. We integrate Kitaev's algorithm with 
the approach taken here, and modify the control of the algorithm slightly in order 
to simplify the classical post-processing. While this, on its own, does not seem to 
lead to a practical speed-up for solving problems, it does go some way to 'demystify- 
ing' such algorithms, hopefully making them more accessible to further investigation 
and development. In other words, by requiring all of the 'work' of computation to 
be performed within 'classical' circuits — encoding essentially no complexity within 
unitaries that are not simply permutations of the computational basis — it is hoped 
that it could be easier to understand which parts of an algorithm might be easier to 
optimise, parallelise, or otherwise simplify, especially when adapting an algorithm 
to target a marginally different problem. The 'naturalness' of restricting to classical 
gates and Hadamard gates for analysing aspects of complexity has been noted by 
many authors (see especially [10] for recent work on algebraic circuits). In particu- 
lar, in [21] it is shown that simpler proofs exist for BQP C PP when this approach 
is taken. The ideas of this Chapter motivate a similar analysis in Chapter 5 of a 
different discrete group. 

We begin with some basic definitions and observations, discussing the role of adap- 
tion in defining the Fourier hierarchy classes FHfc, FH' k , and BPP L > , considering 
the various ways in which quantum circuits implementing classical logic might be 
interfaced. We show how these classes are related, and where they are likely to dif- 
fer. Then we go on to consider Kitaev's algorithm for eigenvalue estimation, which 
belongs naturally in FH2, and consider the control schedule for that algorithm in 
some detail. We use this to show the new result that at least one cryptanalytically 
significant problem also belongs in BPP I > (Theorem 4.2.4), which is tantamount 
to saying that it can be rendered without the use of any ancilla workspace. 

In §4.2.3, we discuss extensions to these ideas, showing that other related problems 
might not be solvable without ancillae. We briefly consider the trade-off between use 
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of ancillae and circuit depth, and end by showing that continuous-group problems 
such as the solution of Pell's equation can also be rendered using Kitaev's scheme 
in FH2. It is hoped that this understanding and analysis of the Fourier hierarchy 
will help with the future classification and development of quantum algorithms and 
subroutines. 

4.1 Definitions 

Throughout, global phases are ignored. This means that wherever it is well-defined 
to do so, we shall conflate a matrix group with its projective equivalent (quotienting 
by C*). 

4.1.1 Basic definitions 

Definitions of 'classical' gates 

The perspective taken in this chapter is to regard quantum circuitry as a natural 
extension of classical (reversible) circuitry. For this reason, it is convenient to fix 
a computational basis as usual, and then label certain quantum gates as 'classical' 
because they fix that particular basis. This expression "classical" is not to be un- 
derstood as saying anything about an incapacity for such gates to create or modify 
superposition or entanglement, rather it is a basis-dependent property that describes 
how such gates collectively stabilise the computational basis. 

Our first definition covers all permutations of the computational basis of an n-qubit 
machine, generated by 'generalised Toffoli' gates. 

Definition 4.1.1. The Permutation Group associated to a system of n qubits is 
generated by the set of generalised Toffoli gates : 

Permutation Group := ( A J (X) : j G [0..n— 1] ) 

=* Sym( T ). (4.1) 

This group is represented by the permutation matrices, constructed over C in gen- 
eral. The cardinality of the group is 2 n !. (If we were instead to limit to A 2 (X) = 
Toffoli gates, A 1 (X) = C-Not gates, and A°(X) = X gates, then only the alternating 
subgroup would be generated, having cardinality 2 n !/2 : so appending a separate 
ancilla qubit |0) would be a way to restore the entire permutation group without 
resorting to 'large' gates.) 

A more general definition, which still avoids the introduction of complex phases for 
the superposition phenomenon, is represented by the group of all signed permutation 
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matrices, and is the semidirect product of real orthogonal diagonal matrices with 
permutation matrices. 

Definition 4.1.2. The Classical Group associated to a system of n qubits is gener- 
ated by the Permutation Group together with generalised controlled-Z gates : 

Classical Group := ( A j (X), A j (Z) : j G [0..n - 1] > 

^ ( Z/2Z f n x Sym{ 2 n ). (4.2) 

Again, n counts all qubits in a circuit, the full circuit width. The size of the group 
is 2 2 ™ • 2 n !, if we count global phase. As with the permutation group, an alternative 
construction for simulating this group makes use of a small ancilla while limiting 
individual gates to three qubits. (Because it is abelian, we write the group %/2% 
additively, rather than multiplicatively as Cyc(2) or Sym(2).) 

Any element U of the classical group can be factored uniquely into a permutation / £ 
Sym(2 n ) followed by a 'diagonal' operator a € (Z/2Z) 2 , because of the structure as 
a semidirect product, and so we can sensibly write U = (a, f) to abbreviate line (4.3) 
below. 

U : \x) ^ (-lf(/(*)) |/( x )) (4.3) 

Note that the map U = (a, f) *— > f is a group homomorphism, and so if a circuit is 
given for U, then the subset of gates of the circuit that implement the / part form a 
well-defined subset : indeed they are just those gates from the permutation group. 
But the map U = (a, /) i— >• a is not a group homomorphism (the classical group is 
not a direct product), and so the 'complexity' apparent in the a part can be owing 
to the gates that implement / as much as to any other part of the circuit. 

The broad motivation for these definitions comes not from physical considerations 
pertinent to the task of fabricating a quantum information processor, but from 
the desire to analyse a fairly natural-looking measure of circuit complexity that 
is not apparent within the standard model, viz the number of global Hadamard 
transformations (Qb, defined below) needed, when quantum circuitry is seen as 
directly extending classical circuitry. 

Definitions of basis-change 

We consider the Binary Quantum Fourier Transform, denoted Qb, also called the 
(global) Hadamard transform. Because we sometimes wish to think of it as a passive 
transform, acting not as a gate but rather by conjugating subsequent gates or mea- 
surements, we consider that it is to act on every qubit in a computing system. 
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Definition 4.1.3. The Binary QFT is given by 

Qb ■= H® n , (4.4) 

where n counts all the qubits in a circuit. 

As a gate, it acts on a unitary space of dimension 2 n , and is denned by its action 
on the computational basis as follows, interpreting labels s and t as vectors in F2 : 

Qb : |t) h+ 2-»/ 2 £(-l)- t |s); (4.5) 

s 

it has order 2, and hence is an involution. 

As a passive action conjugating gates or measurements, it preserves locality of the 
operator algebra, effectively just exchanging Pauli X operators with Pauli Z oper- 
ators. (Other Fourier transforms, such as the Integer Fourier Transform associated 
to the ring Z/2 n Z, do not share this property of preserving locality, and ought 
presumably be regarded as essentially more complex for that reason.) 

Simulating single-qubit Hadamards 

Proposition 4.1.4. An Hadamard gate can be emulated from a gate-set containing 
all small gates from the permutation group (Def. 4. 1.1), together with the conjugates 
of those gates by Qb (computational basis input is assumed); and hence such a 
gate- set is universal for BQP. 

Proof. We can employ a simple technique from the idea of spin chains (c/. §2.2) 
to render a local H operation on each of two qubits a and b, while simultaneously 
swapping over their data, simply by using 'classical' gates and two applications 
of Q B : 

K(Z b ) ■ Q B ■ A a (Z b ) ■ Q B ■ A a (Z b ) = H a ■ H b ■ Swap a6 . (4.6) 

With the incorporation of two ancillae, |1) |— ), it is easy to render the same operation 
using only permutation gates and two applications of Qb ■ The following construction 
emulates the previous one, preserving the ancillas : 

Kl b {X u )-QB-Al b {X^)-Q B -hl b {X u )\l)^\-) v =* H a -H b -Sw &Vab . (4.7) 

Of course, the Swap gate itself is also an element of the permutation group. 

We can even drop the requirement for there to be provided an Hadamard-basis 
ancilla, because one can be constructed directly. The gate Qb • A 2 ab (X c ) ■ Qb may 
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also be written 1 — 2 | l) af)C ( l| a ftci and so applying it to |001) abc one obtains 

a non-trivial superposition state (|00) — | )) |1). Form two copies of such a state, 

and together these must be related by some permutation of the computational basis 
to a state that contains separately a |+) state amongst its qubits : 
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(4.8) 



Apply such a permutation, ignore the remaining qubits besides the |+), and apply 
Z = Qb • X ■ Qb to it in order to obtain the state |— ), for subsequent use as an 
ancilla. 

Because it is well-known that Hadamard plus Toffoli suffice for universality, so it 
follows that permutation gates together with their conjugates by Qb are sufficient 
for implementing a universal gate set for BQP. □ 

4.1.2 Definition of Fourier hierarchy 

Following [63], the Fourier hierarchy is defined in terms of the number of time- 
slices within which Hadamard gates are used within a computation that is otherwise 
'classical'. 

Definition 4.1.5. A language C belongs to FHj. if it is decided with bounded prob- 
ability by a uniform family of circuits that have Hadamard gates within at most k 
time-slices, and computational basis-preserving gates otherwise, and computational- 
basis ancillds. 

This definition should be understood as meaning that the way in which one decides 
whether some string x of length i is in C is by applying a circuit Cj from a uniform 
family to the computational-basis state \x) |0), where the size of the ancilla register 
|0) would depend only on i and be bound by some polynomial. Moreover, the 
decision would rest on the value of a single qubit (allowing for bounded probability), 
measured in the computational basis. It is generally understood that the allowed 
'classical' gates are those from the permutation group (Def. 4.1.1) that affect at most 
a constant number of qubits, e.g. three. 

We also consider a slightly different version of the Fourier hierarchy, obtained by 
disallowing individual Hadamard gates, instead allowing only the Qb operation that 
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spans the entire computer. We let FH' denote the so-called strict Fourier hierar- 
chy. 

Definition 4.1.6. A language C belongs to FH^ if it is decided with bounded proba- 
bility by a uniform family of circuits that use only gates up to three qubits wide from 
the classical group (Def. J^.1.2), together with at most k uses of the Qb operation, 
allowing also for computational-basis ancillds. 

By Proposition 4.1.4, this latter hierarchy is not so very different, since individual 
H operations can be simulated by appropriate use of Qb and ancillse. 
Corollary 4.1.7. FH A C FH' 2fc . 

Proof. Given a circuit that uses Hadamard gates in k time-slices, we can readily 
construct an FH' 2fc equivalent circuit by using the substitution indicated at line (4.6). 

□ 

Note that FH' X = FH' = FH = P (c/. [63]). The reason for the first equality is that 
there is no utility in having exactly one use of Qb, because given the constraints on 
input and output, nothing can be computed in that case. Formally, the magnitude of 
(0| Ci • Qb • C% |V>) is completely independent of C\ and C2 if they are both unitaries 
in the classical group and \(f>) and \ij)) are both in the computational basis. 

4.1.3 Definition of Fourier sampling oracle 

Another way of understanding this kind of extension of classical computing to quan- 
tum computing uses the idea of oracular access to FH fe . It turns out that in this 
way of thinking, FH' is not quite the simplest non-classical computing model that 
we can consider. Accordingly, we define the Fourier Sampling Oracle. 

Let U = (a, f) be a unitary map on w qubits, given by a circuit using gates from 
the classical group, as at line (4.3), where / : F™ — > F™ is a permutation, and 
a : F™ — > F2 is a boolean function. Let Pjj be the probability distribution on 
domain F™ that ascribes weights as follows : 

Pu(y) 

/_-j\cr(a)+cr(a+d)_ ,^g\ 

7 

Definition 4.1.8. In the notation given above, the Fourier Sampling Oracle, denoted 
FS, is a device which, on input a classical description of a circuit for some such U , 
returns a single sample (from W™) from the corresponding distribution Pjj. 

Note that the distribution Py depends on the function a but is independent of 
the function /. One might as well therefore consider taking / to be the identity, 
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by composing U with (0,/ _1 ) to get (a, 1). A circuit for (0,/ _1 ) can readily be 
identified from a circuit for U (just pick out the permutation gates and reverse 
their direction), but the composition to make a circuit for (a, 1) will still contain 
permutation gates, which cannot generally be rearranged to 'cancel' one another 
without increasing the number of 'diagonal' gates exponentially. 

Parallel calls to FS can be emulated by a single call to FS, because of the iden- 
tity 

Qb-(U®V)-Q b = {Qb-U-Qb)®{Qb-V-Q b ). (4.10) 

BPP is an interesting object of study if only because it can be defined quite 
independently of quantum mechanics, and therefore without reference to the actual 
creation or simulation of quantum states, processes, or circuits. Note that the suc- 
cess probability of an algorithm in BPP can be boosted toward unity without 
increasing the number of calls to the FS oracle, simply by parallel instantiation 
during the FS part, followed by majority voting afterwards. (This would not be 
true if Qb were replaced by some more general Fourier transform not satisfying the 
locality condition of line (4.10), which is another reason for preferring to use the 
simpler Qb in these definitions.) 

It is clear from the definitions that FH 2 works out to be the class of problems that 
can be solved by BPP with a single call to J-S, with deterministic pre-processing 
and subsequent post-processing. More generally, we write BPP I > to denote clas- 
sical computation with up to k adaptive calls to J-S, with randomised pre- and 
post-processing permitted, and this too can be regarded as forming a hierarchy of 
complexity classes. However, it is by no means apparent that even a single call to an 
oracle for FH3, say, might be simulable by polynomially many calls to J-S. There- 
fore, the 'hierarchy' of what can be computed in BPP with increasingly many calls 
to FS is quite plausibly strictly contained within BQP. That is to say, it would be 
surprising if BQP C BPP ■ FS \P ol v\ m Yet to prove this separation rigourously would 
of course involve separating BQP from BPP. 

Proposition 4.1.9. FH' 2 C BPP^W C FH 2 

Proof. Immediate from the definitions. □ 

In §4.2 we consider problems in FH2. Our main result of the section is to see why 
some of these problems are also in BPP I >. 
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4.1.4 Adaption 

Informally speaking, one might say about algorithmics within the strict hierarchy 
FH' that it is a way of 'gluing together classical subroutines quantumly', by inter- 
leaving 'classical' circuits with the Qb operator. We call this quantum adaption, 
because quantum data is being passed from one 'classical' circuit to the next to 
drive the computation. 

Similarly, one might say of BPP ' ' that it uses classical adaption, because there 
is no transfer of quantum data between oracle calls. (The oracle calls themselves do 
involve 'classical' circuits in some sense, and also do involve quantum computing, 
but there is no actual flow of quantum data between distinct classical circuits.) 

A third kind of adaption that we mention, for completeness, is one whereby both 
quantum and classical data are explicitly passed from one 'classical' circuit to the 
next, in a serial manner. This we call mixed adaption, and in this case the Qb 
operation is no longer needed. Proposition 4.1.10 below illustrates another way of 
conceptualising FH, in terms of mixed adaption. 

Proposition 4.1.10. Limiting circuits to use only classically- controlled gates (§3.3.4) 
from the permutation group, inputs from the Hadamard basis, intermediate single- 
qubit measurements in the Hadamard basis, and feed-forward of classical measure- 
ment data to subsequent classical-control, one can efficiently emulate Hadamard 
transforms (and hence ultimately BQP -universality) . 
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Figure 4.1: Using permutations and mixed adaption to simulate an Hadamard gate. The 
green wire denotes a classical control signal to an X gate, following a measurement in the 
Hadamard basis. A formula for this figure is given in line (4.11). 



Proof. The formula is straightforward, and can be seen directly in Fig. 4.1. Write 
X b to denote applying gate X to qubit b conditional on qubit c having been 

measured to be in the image of the projector |— )(— |. Then the formula for emulating 
an Hadamard gate is given symbolically as 



A b (X a ) ■ x[- ){ ^ ■ A 2 bc (X a ) ■ A 2 ac (X b ) 



+ 



H a 



■ (4.H) 



To see why this works, take \ip) = a |0) + (3 |1). Then the starting state on the three 
qubits may be written as a vector of amplitudes as (a, (3, — a, — (3, a,j3, — a, — j3). 



85 



After the first Toffoli gate, this becomes (a,/3, — a, — j3, a, — f3, — a, (3). After the 
second Toffoli gate it becomes (a, f3, —a, (3, a, —(3, —a, —(3). Measure the third qubit 
in the Hadamard basis and it becomes either (a + j3, —a + (3, a — [3, —a — (3) |+) or 
(a — /?, —a — (3, a + (3, —a + (3) |— ). Apply X^ controlled on the measurement result, 
and this gives (a + /3, — a + f3, a — (3, —a — (3) in either case, up to global phase. Apply 
the final C-Not gate to obtain (a + /3, —a — f3,a — (3, —a + /3), which is equivalent 
with H a |'0} a |— ) 6 . Since it works for all pure \ip), by linearity it must work for all 
quantum data. □ 

If one were ever to discover a paradigm for quantum computing within which gates 
from the permutation group were fast to implement, but where other gates were not 
feasible, and where measurements and feed-forward of classical data were slow, then 
perhaps the Fourier hierarchy FH would be an ideal way of measuring algorithmic 
complexity within such a paradigm. 

Note also that one could easily adapt the proof of Proposition 4.1.10 to use post- 
selection in place of measurement and feed-forward, to prove that classical com- 
puting with a single call to J-S and post-selective post-processing is universal for 
PostBQP, which is PP (cf. §3.2). 
Proposition 4.1.11. BPP^W with post-selection gives rise to PostBQP. 

Proof. Without loss of generality, consider starting with a BQP circuit composed 
of Hadamard gates and Toffoli gates and X gates, beginning with |0) input and then 
a Qb operation, and ending with another Qb operation before measurement (and 
post-selection) in the computational basis. 

Every Hadamard gate in this circuit (besides the ones comprising the initial and 
final Qb operations) should then be replaced by the gadget of line (4.11); but the 
measurement involved within that gadget should be delayed until the end of the 
computation, and correspondingly the classically controlled gate in the gadget should 
be omitted. This leaves us with a circuit having Hadamard-basis input, Hadamard- 
basis measurements at the end, and otherwise all gates from the permutation group, 
and such a circuit is of the correct form for an J-\S oracle, as per line (4.9). At 
the end of the computation, when the J-S oracle returns a string, the bits that 
ought to have been used for feed-forward classical control (which are otherwise no 
longer used) should be post-selected to have been qubits in state |+) (so that it was 
correct to have dropped the controlled gates), which always happens with non-zero 
amplitude by the proof of Proposition 4.1.10. □ 



86 



4.2 Kitaev's Algorithm Revisited 

Kitaev's algorithm for the Abelian Stabilizer Problem [39] belongs naturally within 
the category of FH2 computing, and certain applications (most notably in cryptog- 
raphy) lead to the solution of problems in BPP ' >. 

Theorem 4.2.1 (Kitaev). The decision variants of Integer Factorisation and the 
Discrete Logarithm problem are in FH2. 

In this section, we recall the core of Kitaev's algorithm — Eigenvalue Estimation — 
describing it in terms of the J-S oracle, and offer a slightly different 'control schedule' 
for simplifying the follow-on post-processing. Using this, our main result of the 
section is to establish that (the decision variants of) the cryptographic problem 
"Discrete Log over Finite Fields of Characteristic 2" is in BPP ' >. We also discuss 
other cryptographic problems in FH2 whose arithmetic is sufficiently complex that 
it is not apparent whether or not they are also in BPP . Unlike FH2, the oracle 
J-S admits no space for ancillae, so all the 'computational work' it performs is done 
'in place' : a very limited form of computing (c/. §3.3). We also consider the depth 
of circuits required within the FH2 and BPP ' > frameworks. 

4.2.1 Eigenvalue estimation 

Let / be a permutation on the set of strings of length n, and suppose that for any 
'control integer' c we can construct a circuit of width n and size 0(poly(n) ■ log c) for 
implementing / c , using gates from the permutation group. The goal of Eigenvalue 
Estimation is to find the length of one of the larger cycles of /. This is called 
Eigenvalue Estimation because the cycle-structure of / is naturally encoded within 
the spectrum of the unitary map corresponding to the circuit that implements / : 
that is, to each cycle of length q there corresponds a subspace of dimension q spanned 
by the computational basis elements associated to the elements of that cycle, and 
in a different basis this space can be expressed as the product of one-dimensional 
eigenspaces having eigenvalues exp(2Trin/q) for each k £ [0..q— 1]. In other words, for 
x some point on a q-cycle of /, the following two bases span the same space : 

{ |x> , |/(x)> , l/ 3 ^)) |/«- 2 (x)> , l/ 8 " 1 ^)) }, 

\X X (K)) : = ^f>p( ZM ^)|/ J ( 2; )> 1 • ( 4 - 12 ) 

Vq i=° v q J L=o 

Eigenvalue Estimation is about finding both a q and a k for some suitable permuta- 
tion / in context of some eigenvector \\ x (k)), or possibly for two different commuting 
permutations in context of the same mutual eigenvector. 
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Using the TS oracle 

To find one such q using a single call to TS, we need to construct a unitary U to 
submit to the oracle. Kitaev's idea is to implement A(/ c ), onto some essentially 
arbitrary target, many times for many different values of c, using different control 
qubits but the same target qubits. (One could say that this technique has the effect 
of measuring the target state in the eigenvalue basis.) So we implement this idea 
by fixing some schedule of 'control integers' — a list {ci, C2, . . • , c m } — and consider a 
circuit on w = m + n qubits of the form 

U(0)[l.. w ] := A l(/[m+l. m +„]) • A 2(/^ +Lm+n] ) • • • Am(/[ C ™+i. m +n])- ( 4 - 13 ) 

Now this U(0) applied to a state of the form |+)r 1 "^ l i \^x(^))\ m +i m + n i would leave 
the \\ x (k)) register unchanged, and would transform the jth qubit (for j £ [l..m]) 
independently by rotating it around the equator of the Bloch sphere through an 
angle of 27TCjK,/q. (This is sometimes called 'phase kickback'.) This transfers some 
information about the eigenstate \\ x (n)) into the jth qubit in a way that can be 
measured. Moreover, by using different values of Cj for different qubits, we obtain 
different data about the eigenstate. We call the first m qubits control qubits or the 
control register and we call the last n qubits the target register. 

So U(0) is almost the unitary we want for submitting to the J-S oracle, except that 
the oracle would apply U(0) to the state |+) w , which is not so useful : the target 
register state |+) is a linear combination of eigenvectors of the form \\ x (0)), so 
that for all of these the rotation angle is (independent of Cj and q) and hence the 
phase kicked back is 1. We therefore adjust U(0) by composing it with a random 
pattern of Z gates to make the unitary U(r) : 

n 

U(r) [Uw] := U(0) [Uw] -l[ZZ +t , (4.14) 

where r is a random n-long string of bits. Now if U(r) is submitted to the oracle, the 
effect would be that of applying U(0) to a state whose control register is correctly 
set (|+) m ) and whose target register is effectively fully depolarised. Thus the final 
measurement results will be no different than had we uniformly randomly selected 
a point x and uniformly randomly selected a number k and applied U(0) with the 
target register in the eigenstate \\ x (k)). 

(This illustrates quite nicely the point made back in §4.1.1 regarding the factorisa- 
tions of a 'classical group' unitary into a 'permutation' part and a 'diagonal' part. 
For when the operator U{v) is factorised as at line (4.14), the 'diagonal' part comes 
first and is rather trivial, but if it were to be factored the other way round — as at 



line (4.3) with the 'diagonal' part coming after the 'permutation' part — then the 
resulting 'diagonal' part would be far more complicated.) 

Then the w-bit string returned by the oracle will be such that amongst the first m 
bits, the bias of the jth. bit will be cos(2ircjK,/q), where q is the length of a randomly 
chosen cycle of / and k is a random integer in [0..q — 1]. We have therefore proved 
the following lemma : 

Lemma 4.2.2 (Cf. [39]). For any uniform family {/} of permutations with prop- 
erties as above — provided the descriptions for constructing the circuits for f c are 
themselves uniform — there is a BPP I > subroutine which takes as input a descrip- 
tion for such circuits and a description of a control schedule ({cj}^ 1 ) and outputs 
a string the first m bits of which have biases cos(27T CjK,/q) for j G [l..m], where q 
is the length of the orbit of a randomly chosen point in the domain of f , and k is a 
random integer, neither of which depends on j. 

Proof. Overview as above, calling the FS oracle with the U(r) of line (4.14), with 
additional explanatory details to be found in [39] and [47]. □ 

Note that if the random string r — used at line (4.14) to select a random pattern 
of Z gates — were set to be uniform over all strings other than the all-zero string, 
then this would diminish the probability of finding a case for which k = 0, but it 
would not eliminate that possibility altogether unless / consisted of a single cycle 
of length 2 n . 

Choosing a control schedule 

Next we consider how to choose the integer values Cj that are used as indices on 
/ in the subroutine of Lemma 4.2.2. These values must be specified up front, not 
selected adaptively (for an algorithm in BPP ). In what follows, let <f> = n/q G 
[0, 1) denote the rational number that the algorithm is intended to find. Since 
cos{2-kcj4>) = cos(27TCj(l — 4>)) for all integer Cj, we may as well take <fi to be in [0, |], 
by symmetry. The Cj values then control the probabilities of the bits returned by 
the TS oracle, with po = cos 2 (ttcj4>) and p\ = sin 2 (7rcj<?5>) being the probabilities of 
the jth returned bit being or 1 respectively. These bits are to be post-processed 
classically in order to learn the value (p. 

Kitaev suggested taking the Cj values of the form 2 a , repeating each several times 
in order to get an estimate of 2 a (p with a few bits of precision very accurately. But 
as we have already seen, the bits being returned give information about sin 2 (7r &/(/>), 
not about Cj(j) directly. By way of example, suppose that were rather close to |, 
say 4> = \ + e; so close that measurement of sin 2 (7r2°(/>) = sin 2 (7r(2j; + e)) to a few 
bits of precision would be unable to determine the sign of e with any significant 
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accuracy. But then subsequent measurements of sin 2 (n2 a (ft) for any integer a > 1 
will contain absolutely no information about the sign of e, because sin (7r2 Q ( j+e)) = 
sin 2 (7r2 Q (j — e)). Thus a bit of information about </> would be inaccessible in this 
case, and without further modification, the algorithm would not work in the worst 
case with this control schedule. 

Our proposed solution for a choice of control schedule — designed so as to interface 
with the subroutine of Lemma 4.2.2 without further modification — is to use integers 
of the form 2 Q 3 /3 . 

Lemma 4.2.3. For any e > there is a set of integers {cj} r [L 1 and an efficient 
classical algorithm that succeeds with probability 1 — e, such that for any rational 
(ft € [0, 2] with denominator < 2 n , the algorithm outputs (ft when input a sequence 
of m independent bits with respective biases cos(2TTCj(ft) (as per Lemma 4-2.2). The 
algorithm takes time roughly linear in m, and m = 0(n 2 log "), and the bit-length 
of each Cj is 0(n). 

Proof. First we give the control schedule {cj}' i JL 1 precisely, and the algorithm, and 
then we argue for its correctness. 

Let a take all integer values in the range [0..£] and let /? take all integer values in the 
range [0..d] where t = 2n and d = 0(n). For each a, (3 pair, let there be k different 
times when Cj = 2 a 3^, where k = 0(log(-)). Then m = k • (t + 1) • (d + 1) is the 
total size of the control register. 

Denote the binary expansion of the unknown (ft out to t bits of precision as (ft = 
O.O02</>3</>4 .... Fix a parameter r\ w 0.32. Process the random bits by letting /i 2 <*3/3 
be the average of those k bits for which cj = 2 a 3"; that is to say 

H c ~ -• Bin( k, sin 2 (ireeft) ). (4-15) 

k 

The pseudocode for the ensuing estimation procedure is given as follows, in Fig. 4.2. 

Having obtained such an estimate for (ft (assuming no errors), one can use the usual 
efficient technique of continued fractions [39] to recover the rational (ft exactly. 

The technique used in Fig. 4.2 can be understood inductively. Take the inductive 
hypothesis to be that, on entering the outer loop for the a + 1st time, the first a + 1 
bits of (ft are correctly known {i.e. bits (fti,(ft2, ■ ■ ■ , (ft a +i)- In fact, line 3 of Fig. 4.2 
explicitly shows the bit (ft a +i being stored : at each point in the algorithm, the 
variable a holds the bit value deemed to be most likely for the next (unstored) output 
bit. Then the inner loop examines successively more data, in a bid to determine the 
parity (ft a +i © (ft a +2, from which the next output bit is learnt. 

The role of the factor 3^ in Cj is to deal with the case whereby the estimator for 
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Figure 4.2: Eigenvalue estimation : classical post-processing. 



Input 

Output 

Params 



/i2«3/3 ~ k^{ k,sin 2 7r2 a 3^(j) )> f° r a U P to t, for (3 up to d. 
Estimate for hidden (j) = 0.0^2^3 •• • up to t bits of precision. 
k, d, and 77 all affect worst-case failure probabilities. 



1. cr^O; 

2. for a in [0..£] do 

3. 4>a+i <— o > c ■<— 1 — a; 

4. for /3 in [0..d] do 

5. a <— 1 — a; 

6. if /U 2 q3,9 < 7/ then continue a; 

7. if iJ, 2 a 3 i3 > 1 — 77 then er <— 1 — a , 

8. continue /3; 

9. continue a; 



continue a; 



the next bit (712" ~ sin (-7r2 Q (/>)) happens to be inconclusive for further progress, 
i.e. when it is too close to ^ to distinguish reliably between the two alternative 
hypotheses for the parity <p a +i © 0a+2- 

Estimators of the form 7*203/3 are ideal for that case. This is because as soon as it 
is assured that 2 a <p e (|, |), the estimator 7^2^31 is effective for 'zooming in' to help 
determine on which side of | the value 2 Q c/> is more likely to lie (and likewise for the 
symmetrically opposite case, see Fig. 4.3 for a visual aid). Larger values of j3 then 
give improved 'magnification'. 




Figure 4.3: Schematic for sin 2 (7rc0). If a value aj> lies within the range [|, |] (shaded 
area), then the best way to tell which side of \ it lies on is to triple it (unshaded area), 
magnifying the resolution. 



The parameter 77 is set to be well within the interior of the region [0.146, 0.5], so in 
terms of Fig. 4.3, each round of the inner loop can be thought of as testing whether 
ccj) lies well away from the 'equatorial' neighbourhoods of \ and |, where the parity 
(pa+i @4>a+2 would still be ambiguous. While this ambiguity persists, the inner loop 
increases the value of c by a factor of 3, driving c<j) away from these neighbourhoods, 
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until eventually the required parity bit is correctly learnt with high probability. 

To understand the success probability of the algorithm and the role of the parameters 
k, d, 77, it is appropriate to consider the probability of the algorithm making a mis- 
assignment of the a + 2nd bit of eft, given that all prior assignments were correct. 
(Note that if 2 a+2 (ft happens to be an integer precisely, then there are two equally 
valid possibilities for the a + 2st bit of (ft; and this causes no practical problem.) 

There are four places where the algorithm shown can go wrong : namely lines 6, 7, 8, 9 
in the code of Fig. 4.2. It makes an invalid assumption at line 6 if /j c < rj when in 
fact sin 2 (nc(ft) > J. The marginal probability of this happening on any particular 
c is never worse than exp(— 2fc • (k — t/) 2 ), using the Chernoff bound. Similarly, the 
algorithm makes an invalid assumption at line 7 if it sees fi c > 1 — 77 when really 
sin 2 (TTC(j)) < \. Again, the marginal probability is as above, by symmetry. Thus the 
overall probability of either of these two kinds of error occurring at any point in the 
algorithm, regardless of (ft, is certainly bounded above by 

(t + 1) ■ {d + 1) ■ exp (-2k • { l - - r,f\ . (4.16) 

This is exponentially small in k, for fixed rj. 

The algorithm makes a potentially invalid assumption at line 8 if it is not jus- 
tified in looking at the next 13 value, because sin (ttcc/)) lies outside the range 
[sin 2 (|).. sin 2 (^)] despite the fact that \i c lies within [77..I — 77]. As before, this prob- 
ability is bounded above at any given point by a Chernoff bound of exp(— 2ft(7/ — 
sin 2 (5)) 2 ). So the overall probability of this kind of error occuring at any point in 
the algorithm is bounded above by 

(t + 1) • (d + 1) • exp (-2ft • (77 - sin 2 (^)) 2 ) , (4.17) 

which is also exponentially small in ft. 

The fourth way in which the algorithm can make an error — line 9 — is by 'using up' 
all the available /3 values for a given a, without coming to a firm conclusion about 
the parity of the next bit. This may be precluded for the rational 4> that we consider 
by choosing d so that 3 d /q > \. This is surely achieved if 3 d > 2™, for example. 
(Heuristically however, to avoid errors with exponentially good probability in the 
average case, d really only needs to be at least as long as the longest run of zeroes or 
ones in the first t bits of (ft, which suggests asymptotically taking d = il(log(t)).) □ 
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4.2.2 Discrete logarithm over F 2 « 

Suppose we wish to find the discrete logarithm between two elements of a finite field 
of characteristic 2. Let g,h £ F^n be the two elements, so that we seek a solution 
to g s = h. (The group F^n is isomorphic to one single cycle of known cardinality, 
i.e. there are 2 n — 1 units in the finite field.) In general, this problem is believed to 
be classically hard. 

Kitaev's algorithm for the discrete logarithm involves learning n/q for two different 
permutations with respect to the same eigenvector. The two permutations in this 
context are multiplication by g and multiplication by h, respectively. The map 
analogous to the U(0) of line (4.13) is here to be defined so that the first half of the 
control bits (say bits [l..y]) control applications of powers of the first permutation 
(i.e. multiplication by g Cj ), while the second half of the control bits ([y + l..m]) 
control applications of powers of the second permutation (i.e. multiplication by h Cj ). 
Then, adjusting Lemma 4.2.2 accordingly, we can arrange for an output string the 
first y bits of which have biases of the form cos(2wcjK/q) and the second y bits of 
which have biases of the form cos(2ir CjKs/q), for the same k and q (and for Cj of our 
choosing) . 

Theorem 4.2.4. The decision version of the discrete log problem over finite fields 
of characteristic 2 is in BPP L > . 

Proof. To apply the ideas of this section to the discrete log problem requires that 
we have a concrete way of representing F 2 ™ using n-bit strings, and also a way of 
implementing the appropriate permutations using n-bit wide permutation circuits, 
without ancillas. The permutations in question are various powers of multiplica- 
tion by g or h within the representation of F 2 ™ . Now these various powers can all 
be precomputed (there are m of them, and m is bounded by a polynomial in n, 
cf. Lemma 4.2.3), and each is simply multiplication by a constant. Multiplication 
by a constant — in any standard representation of the field — is an F2-linear trans- 
formation. This means that over F 2 it can be represented as an ra-by-n matrix 
multiplication. By performing Gaussian Elimination on such a matrix, one can fac- 
tor it into a pair of triangular matrices, and thence construct a circuit of Not gates 
and C-Not gates, of quadratic complexity [20] for implementing it in place. All this 
pre-processing can be rendered classically in polynomial time (and hence within the 
present framework), and thus a suitable input to the FS oracle can be prepared, in 
the same way as was done for Lemma 4.2.2 (see line (4.13)). 

Finally, by learning n/q and Ks/q, for some random (non-zero) n, it is easy (with 
classical post-processing) to recover s, which is the sought-after discrete log. □ 
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4.2.3 Extensions and future work 

This section describes why it is not clear whether all related problems can be solved 
in BPP I i, why it is not clear how well general problems in FH2 parallelise, what 
the mathematical relationship is between the algorithm of eigenvalue estimation in 
FH2 and the so-called Partition Problem, and how the algorithm generalises to the 
continuous context. 

Generalising to other Abelian Hidden Subgroup problems 

Suppose as before that / is a permutation on the set of strings of length n, but 
now suppose that we need some clean ancilla space if for any integer c we are to 
construct a circuit of size 0(poly(n) -log c) for implementing / c , using gates from the 
permutation group. In this (more realistic) case, Lemma 4.2.2 will not apply as it 
stands, because the only ancilla available in context of a J-S subroutine would be one 
prepared in the Hadamard basis. Nonetheless, it is still possible to solve eigenvalue 
estimation problems within BPP for such families of function, provided that a 
logarithmically big ancilla suffices, and provided that the language being decided 
is itself in NP. This is because — as we saw in Chapter 3 — one can compute using 
depolarised qubits, provided one is prepared to amplify success proabilities via an 
outer loop with a 'verifier' (i.e. run the algorithm many times and look at all 
answers before deciding whether to accept). Just as at line (4.14), we can prepare 
a depolarised ancilla by applying a random pattern of Z gates to the Hadamard- 
basis ancilla. There will then be a non-negligible probability of the overall algorithm 
behaving as though a clean ancilla space had been provided. 

So what happens when we try to solve Integer Factorisation (equivalently, compute 
Euler's totient function), or Discrete Logarithm over other finite fields? Although 
such problems are clearly solved within FH2 and although they certainly belong to 
NP (and so have polynomial time verifiers for use in post-processing), it is still not 
apparent that they can be solved in BPP , because it is not clear that one can 
find efficient circuits for implementing the required permutations in place even with 
log-sized ancilla space. We note that it is clear that circuits of permutation gates 
must exist for this kind of in-place permutation, but are they polynomial in size, 
and can they be compiled in polynomial time? 

For example, for Euler's totient function we would need to be able to design an 
efficient n-bit circuit of permutation gates for mapping integers x 1— > x ■ m (mod TV) , 
where N and m are compile-time constants, 2 n_1 < N < 2 n , < x < N, < 
m < N. Now the arithmetic in question may require only a little ancilla space to 
compute the individual bits of the output, but it is unclear where to write the bits 
of output as they are each computed so as not to corrupt the input before we are 
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done using it! (This problem did not exist in our previous example for discrete logs 
over F^n, because the underlying arithmetic there was seen to become remarkably 
simple after classical polynomial-time pre-processing.) 

It remains as future work to determine which aspects of elementary arithmetic 
can be performed in-place using permutation circuits with log-sized clean ancilla 
space. 

Parallelisation for FH2 problems 

H0yer and Spalek [35] have interesting results about reducing the depth of certain 
quantum circuits to a constant, using so-called 'fan-out' gates of arbitrary width, or 
equivalently 'parity' gates of arbitrary width, both of which belong to the permu- 
tation group. Arithmetic performed with such optimisations will certainly require 
substantial ancilla space, and therefore will not lead to algorithms of the form re- 
quired for BPP . Can it nonetheless lead to significant parallelisation of problems 
in the FH2 framework using these 'wide gates' ? 

We did not find a way to make a good parallelisation, because the constructions 
of [35] additionally require gates of the form A a (e lipZb ) for various angles tp, and to 
emulate such gates using permutation circuits requires not only having permutations 
with appropriate cycle structure, but also being able to construct the eigenvectors 
of these permutations. It is therefore something of an open problem to find optimal 
circuits for arithmetic. 

In [24] a uniform method is given for computing integer addition in place, in log- 
arithmic depth, using permutation gates, with a linear-sized ancilla in the com- 
putational basis. The so-called "Carry-lookahead in-place adders" given there can 
implement quantum addition of the form \a) \b) |0) *— > \a)\a + b)\0), or addition of 
a constant, of the form \b) |0) 1— > \a + b) |0), in the ring of integers without modular 
reduction (the sum being represented in one more bit than the summands). It is 
straightforward to adapt these circuits to render addition modulo a classical integer 
N known at compile-time, without affecting depth or ancilla requirement by more 
than a constant factor. Analogous results for integer multiplication are not presently 
known. 
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Relation to Subset-Sum 

Lemma 4.2.2 expressed the probability of measuring some m-bit string in terms of 
a stochastically chosen angle <f> of the form n/q : we found that 



1 q—lm 

F(y) = T^Ell^H^o^/?)), (4.18) 

where y is the m-bit string returned from the control register, assuming that all 
cycles have length q exactly. But we can also compute this probability directly from 
the Born rule and Bayes's theorem, obtaining 

z r 

= EE2-"(y,z|Q B -t/(r)-Q B |0,0) 2 



E£2" 



Et- 1 ) 8 ' 



2 
y+/ s - c (s")-z+s"-r 



= 2 -2m-n J2 (-l)( s + s ')-y {/ S ' C ( S ") = / S '- C (s")} 

s,s',s" 

= ^ E (-lF (s+s,) {s-c^s'.c}, (4.19) 

s,s'e{o,i} m 

where s • c is being used as a shorthand for Y^j=l s j ' c j> an d so on - (The final 
bracket of the right side of the equation is a sort of Kronecker delta function : 
highly discontinuous. It takes the value 1 when the difference between s • c and s' • c 
is a multiple of q, and takes the value zero otherwise.) 

Therefore, whenever c is an integer tuple, for all positive integers q, we have the 

following lemma : 

Lemma 4.2.5. For all y G {0, 1}"\ 

q-1 m 

— E n( 1 +(- 1 ) %cos ( 27rc W9)) = E (-i) y - (s+s,) {s-c= 9 s'.c}. 

k=0 3=1 ' s,s'6{0,l} m 

Proof. We have already seen that this formula holds approximately whenever there 
exists a permutation on 2 n elements all of whose cycles have length q, and that the 
quality of the approximation increases without limit as the permutation considered 
tends more to be composed of length q cycles (c/. Lemma 4.2.2). Yet since neither 
side of the present equation depends on n, the formula must be exact. □ 

This lemma then yields a corollary regarding (classical) randomized approximation 
schemes for counting solutions to modular partition problems (a variant of subset- 
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sum), which may be of independent interest. 

Corollary 4.2.6. Let q be any positive integer modulus, and let c = {cj} 7 JL 1 be a set 
of integer weights. Let s denote a uniformly selected m-bit string to select a subset of 
weights, and let s denote its complement. Let k denote a uniformly selected element 
ofZ/qZ. Then 



E K 



] [ cos(2ircjK/q) 
3=1 



E s 



|s-c= g s-ci 



Proof. This is shown simply by taking the Fourier transform of the result of the 
lemma above. It can also be seen directly by expanding each factor cos(2Trcjn/q) 
out to w+ ^ where ui = e 2mc j K /<i^ anc l cancelling terms. □ 

The right side of the equation counts the number of solutions to the modular par- 
tition problem, and involves 2 m terms, which could be prohibitive to exhaust over. 
The left side however counts just q terms, each a product of m factors, which may 
be much smaller. If we take q to be the sum of all the weights, then we solve the 
ordinary (non- modular) partition problem in time O(q-m). (This is still exponential 
if any weight is exponentially large, and therefore not a generally efficient solution to 
this NP-complete problem.) It is left for future work to integrate this idea properly 
into quantum algorithms for subset-sum problems. 

Continuous problems and Pell's equation 

We close the Chapter with a proof that eigenvalue estimation within FH2 generalises 
to the context of continuous groups ( cf. [32] ) , illustrated by reference to the number- 
theoretic problem of solving Pell's Equation [33]. 

Theorem 4.2.7. The decision version of the problem of solving Pell's equation lies 
within FH2. 

The proof uses the following lemma : 

Lemma 4.2.8. Let c = {cj} 7 JL 1 be an m-tuple of integers. For all real (ft, for all 

ye{o,i}"\ 

IJ(l + (-l)Wco8(c^)) = — Y, (-l) y - (s+S,) cos((s-s')-c0). 
i=i s,s'e{o,i} m 

Proof. Let d denote an m-bit string and let d denote its complement. By induction 
on m, with liberal use of the basic trigonometric identity 

-{ cos{A + B) + cos{A - B) ) = cos{A) cos{B) , (4.20) 
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it follows that 

m 

E d [ cos((d - d) • c 4>) ] = Y[ cos(cj4>). (4.21) 

3=1 

Then we can write d := s © s', and we can break s up into two parts — one part 
supported by d and one part supported by d — writing s = a + b where a < d and 
b_Ld. Then s' = sffid = a + b. 



a line (4.21), 
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j:dj=l 


cos(c 


£(-i) yd 
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j:dj=l 


cos(c 
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d j:dj=l 


-\)Vi 


cos(c 



E a [ cos((a-a)-c^») ] (4.22) 

E a E b [ cos((a + b - a - b) • c 0) ] 
^(-ir d E s [ cos((s-(s©d))-c0)] 

d 

^£(-l )y -(s+s') cos( ( s _ s '). c0)) 



whence the statement of the lemma follows by factoring the left side. □ 

This identity is useful whenever it is natural to think about continuous examples 
of eigenvalue estimation (c/. [32]). A good example would be Hallgren's method 
for solving Pell's equation efficiently [33], which first estimates the real valued reg- 
ulator, R, of a real quadratic number field, by working with a computable real 
pseudo-periodic function whose period is R. Hallgren's method was developed as 
an extension of Shor's algorithm, so here we sketch a method for recasting it as an 
extension of Kitaev's eigenvalue estimation, so that it can be rendered within an 
FH2 framework (i.e. using permutation gates and limiting Hadamard transforms 
to just two time-slices within the circuit) for our Theorem 4.2.7. 

We take h to be a map that 'walks out' a prescribed distance along the metricated 
principal cycle of reduced principal ideals of the quadratic number field specified 
by the problem equation, and returns a representation of the reduced ideal thereby 
reached, together with an 'overshoot' distance for how far 'past' that ideal the walk- 
distance goes. This can be achieved using a circuit that computes a series of giant 
steps and small steps to compute the new ideal, and then rounds off the remainder 
value. (See [37] for a full discussion of the relevant number theory and algorithmics — 
but the notation here is a little different.) 

Identify R with a covering of the principal cycle, so that pictorially speaking, 
the principal ideals {to,ii, • • • , tC-i} are laid out on a cycle of length R, each 
ideal having a unique representation. The space on the cycle between succes- 
sive ideals can be discretised to precision 4, for some large integer N, so that 
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any real number corresponds to a real point on the cycle, and is discretely ap- 
proximated by quoting the (unique representation of the) first ideal 'below' it on 
the cycle (i(x/N)) together with the approximate number of steps of size jr from 
that ideal up to the point in question (k(x/N)). We write h^ for the function 
that approximates h to precision jr in this sense : if h(x/N) = (i(x/N),k(x/N)) 
then /ijv(x) = (i(x/N), [N ■ k(x/N)\/N), where x is restricted to integers. Write 
q = \N-R~] for the so-called pseudo-period of the cycle. Here h^ is a pseudo-periodic 
function of period q, because hwiq) = h^(0) = (io, 0). Because of the rounding down 
that takes place when computing h^, the codomain of h^ will likely contain more 
that q distinct points. But as explained in [37], the 'extra' points quickly become 
insignificant as N becomes large enough. 

We use an approximation of 'pseudo-eigenvectors', together with the metaphor of 
state collapse, to see how our standard method for eigenvalue estimation in FH2 
still works in this continuous context, as follows. 

Proof of Theorem ^.2.1. Consider this transformation — implementable unitarily us- 
ing a polynomially sized permutation circuit [33] — where s is a string of m bits, and 
c = {cj} 1 J l =1 are appropriately chosen integers : 

|s) ® \h N (0)) i-> |s) <g> \h N (s-c)). (4.23) 

Consider also the following set of 'pseudo-eigenvectors' (c/. line (4.12)) : 

9-1 1 9_1 

|A(«)> := ^E ex p( Z H K )lMi)> • (424) 

■'~ ) K=0 

These are not quite orthogonal, but make a good approximation to an orthonormal 
basis for the space spanned by the vast majority of the computational basis of the 
second register. 

As with the usual version of Kitaev's algorithm, one performs the above trans- 
formation in superposition and then imagines 'collapsing the state' of the second 
register onto one of the vectors above, selected randomly. This gives — to good 
approximation — the superposition 

S ^ ' 

the superposition being over those s for which there exists a j such that hjsr(s ■ c) = 
h-N(j), which will be at least a proportion 1 — -h of them. Here e s denotes some 
unknown 'noise' term so that s • c + e s = j (mod NR). Therefore e £ [0, 1). 

The first register is measured in the Hadamard basis as usual, and so the probability 
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distribution for the returned string — conditioned on the stochastic choice of k — will 
be 



pw - ^E(-') y - ( - + -'»^( 2T(s - c ; s fl - c+g)K ) 

s,s' 



(4.26) 



where e G [0, 2) (differing from term to term) arises from two combined noise terms. 

There is an 0(1) probability that the k selected stochastically will be sufficiently 
small that the term en in the log of the phase makes very little difference to any of 
these probabilities, and so Lemma 4.2.8 tells us that 

p(y) - ^ ft (! + (-!)"' °«(^)). ( 4 - 27 ) 

which means that with O(l) probability the standard estimation algorithm of §4.2.1 
will work as intended, recovering the real value <p = k/NR for some random n. By 
running the algorithm twice, values can be recovered for two different ks. With good 
probability these will be coprime, and so a continued fractions analysis of the ratio 
of the two recovered <j> values will likely yield the actual integer values of the ks, 
whence the actual value of NR — and hence of R itself — can be recovered. □ 
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Chapter 5 

The Clifford-Diagonal 
Hierarchy 



The theme for this chapter is ostensibly closely related to the previous one. Instead 
of taking polynomially-bounded 'classical' circuits and Hadamard transforms as the 
building blocks for algorithms as we did for the Fourier hierarchy, here we take 
Clifford circuits and 'diagonal' circuits. 

Diagonal circuits have the property that every gate commutes with every other 
gate, and so no 'temporal complexity' can be encoded within a part of a circuit built 
exclusively from these. We introduce U IQV computing" (see §5.2.2) as a particularly 
simple paradigm for understanding what kinds of probability distribution can be 
sampled using very little temporal complexity. As before, we use a discrete TIME 
model and a finite-dimensional unitary space, but in this chapter complex phases 
will be used within the 'diagonal' circuits. 1 

Many of the concepts introduced previously will be seen to translate into this frame- 
work (c/. circuit families §1.3.1, adaption §4.1.4, algorithm hierarchies §4.1.2, or- 
acles §1.3.2...); but rather than showing how to cast well-known algorithms in the 
new paradigm, we instead use it to find a new application for quantum algorithmics. 
Our new application (see §5.5) consists in one side of a particular novel two-player 
interactive protocol, which we conjecture cannot be completed via purely classical 
means, but which completes using the idea of XQP computing. At present, the 
only useful application of this protocol seems to be for demonstrations of computing 
power exceeding that of classical computation. 



1 Much of the content has been extracted from a 2008 publication of mine with Michael Bremner 
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5.1 Overview 

The Clifford-Diagonal (CD) hierarchy is, by analogy with the Fourier hierarchy, an 
arrangement of complexity classes, culminating in BQP. It has not been formally 
introduced in the literature, though it is certainly implicit in [16]. Informally speak- 
ing, an algorithm is said to lie in the kth level of the CD hierarchy if it can be 
rendered using circuits that interleave k layers of Clifford gates and 'diagonal' gates 
(unitary maps whose description in the Hadamard basis is diagonal). As with the 
Fourier hierarchy of Chapter 4 (in particular §4.1.4), one can naturally define the CD 
hierarchy using mixed adaption, or define a slightly stricter version using quantum 
adaption, or define an 'oracular' version using classical adaption. Our focus in this 
Chapter is only on the latter of these three options, described more fully in §5.2.1. 
It is somewhat surprising that the oracular definition might have any computational 
power exceeding that of a classical computer, because there is essentially no tempo- 
ral 'structure' encoded within such an oracle. Although we have not found a decision 
language that can be decided more quickly with the help of the oracle, our main 
technical contribution is to identify a novel two-party 'pseudo-cryptographic' proto- 
col that seems to be efficient only when one of the parties can implement the oracle. 
This is described in §5.5 and analysed in §5.6 where the cryptographic analogy is 
emphasised. 

To grasp the motivation behind the division of circuitry into 'Clifford' parts and 
'diagonal' parts, it is necessary first to understand the computational capabilities of 
these parts in isolation. The Clifford group may be defined in terms of the Pauli 
group, which is itself defined by its action on qubits. Therefore our descriptions of 
quantum circuitry will be limited to qubit processing. Many of the concepts required 
for discussing quantum circuitry on qubits have already been given in §1. Since the 
C-Not gate, the single-qubit Pauli gates, and the single-qubit Hadamard gates are all 
in the Clifford group, and since the single-qubit ir/8 rotation gate forms a 'diagonal' 
group, it is clear that the Clifford-Diagonal decomposition methodology can be seen 
as arising from the standard BQP-universal gateset and hence allows for universal 
quantum computation in the limit of allowing arbitrarily many (i.e. polynomially 
many) interwoven layers of circuitry. 

We begin with some definitions (§5.2), then in §5.3 we discuss architectures within 
which implementation of the lower levels of the CD hierarchy might be comparatively 
easy, before proceeding with a study of the mathematics and algorithms (though not 
classes of decision languages) of the lowest levels of the hierarchy in §5.4. 
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5.2 Definitions 

We define Clifford circuits, X-programs, and the TQP oracle. The XQV oracle can 
be thought of as standing in the same relationship to the CD hierarchy as the Fourier 
Sampling oracle stands to the Fourier hierarchy (c/. Chapter 4). In this dissertation, 
the CD hierarchy itself is not considered directly, and so detailed definitions for it 
are omitted. 

5.2.1 Basic definitions 

Definition of the Clifford group 

Definition 5.2.1. Within the algebra of unitary maps on some number n of qubits, 
C is in the Clifford group if for every P in the Pauli group, C ■ P ■ C' is also in the 
Pauli group. 

Thus the Clifford group is the (discrete) group of unitary maps acting on qubits 
that stabilises the Pauli group. 

Pauli group = ( Xj, Zj : j G [l..n] ) (5.1) 

has cardinality 4 n+1 , where n counts the number of qubits under consideration, if a 
complex global phase change by i is included for mathematical convenience. If we 
remove the global phases of i, but leave the global —1 phases, we obtain the signed 
Pauli group, of cardinality 2 • 4 n . 

Clifford Group = / A(Z), y/Z, h\ (5.2) 

has cardinality 2 n +2n+3 -3-15-63 • • • (4 n — 1), (c/. [46] ), where n counts the number 
of qubits under consideration. Global phase changes in multiples of eighth roots of 
unity are automatically included by the definition above, constituting the centre of 
the Clifford group, so one should remove a factor of 8 from the cardinality if not 
wishing to count these. Then this cardinality can be understood as arising from 
automorphisms of the signed Pauli group : 2 • (4 n — 1) choices for the image of X\, 
then 2 • 2 • 4 n_1 choices for the image of Z\, and so on down to 2 • (4 1 — 1) choices 
for X n , then 2-2-4° choices for Z n — the total cardinality being that of the Clifford 
group quotiented by its centre. 

The Gottesman-Knill theorem provides that so-called stabilizer states, which are 
the orbit of |0) under the Clifford group, are efficiently representable, in such a way 
that the dynamics of the Clifford group acting on its own are perfectly tractable 
classically [3]. A particularly efficient classical simulation method is given in [9]. 
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This kind of computation is not universal even for classical computation. Clifford 
circuits acting on stabilizer states, with single-qubit measurements in the compu- 
tational basis, generate the same kinds of computations in polynomial time as are 
available using just classical parity-circuits. As regards decision languages, the class 
of languages decidable by a log-space Turing machine equipped with an oracle to 
analyse such circuits is none other than ©L (c/. §3.3.3). This class, lying somewhere 
between L and P, forms a natural 'base' for many computing reductions. (For recent 
work clarifying the 'role' of ©L in quantum algorithmics, see [68].) 

Definition of X-programs 

Definition 5.2.2. An "X-program" on n qubits (cf. [60]) is a list of pairs (# p ,p) G 
[0, 2-71"] x Frj, so that 9 p is an angle and p is a string of n bits. The order of the list is 
unimportant, but its length must scale polynomially in n, when considering uniform 
families. 

Each pair (9 p , p) G P is called an element of the X-program. To this X-program P 
we associate an Hamiltonian, denoted 

Hp := E ^ II X i> ( 5 - 3 ) 

having as many terms as there are elements in the list defining P. That is to say, 
each string p indicates a subset of the n qubits for Pauli X to act upon, with 
'action' 9 p . 

The 'diagonal' unitary map produced by such an X-program P is then taken to 
be 



exp( iHp ) = ]J exp i9 p ] [ X 3 , (5.4) 




which is indeed diagonal in the Hadamard basis. 

When an Abelian group is being used for the gates within a circuit — as is the case 
here — it will be essentially devoid of temporal structure, since the order of the gates 
is immaterial. It is convenient then to think in terms of the Hamiltonian, because 
every term of the Hamiltonian commutes with every other. Each term, individually 
described by a pair (9 p ,p), acts on the qubits indicated by the string p, with ac- 
tion 2 9 p . So the circuit nature of an implementation is not especially relevant to 
the unitary map associated to an X-program. 

Mathematically, one can think of P as a function from Fr? to It, sending p to P 



Action is the temporal integral of work, in classical physical terms. 
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if (0 p ,p) is an element, and sending p to if there is no corresponding element. 
But since it is usually our intention that the number of elements will be far smaller 
than 2 n , and since we will often take all the (non-zero) 6 P values to be the same, 
it is also convenient to think of P simply as a subset of FJj. When we do not 
wish to refer to the 6 p values, it is sometimes convenient to think of the subset P 
alternatively as a binary matrix of width n, whose rows correspond to the elements 
of the X-program. 

5.2.2 Definition of XQV oracle 

An X-program is an explicitly 'quantum' object, but it is also convenient to have a 
non-quantum description. Analogously to definition 4.1.8 for the Fourier Sampling 
oracle, we define the XQV oracle. Introduced in [60], the abbreviation denotes 
"Instantaneous Quantum Polynomially-bounded" . Here 'instantaneous' refers to the 
explicit absence of temporal structure within the X-program description, though any 
given implementation may well require a non-trivial amount of time to run. 

Definition 5.2.3. In the notation of line (5.3), the XQV oracle is a device which, 
on input a classical description P of an X-program, returns a single sample string 
x from the probability distribution given by 

p(X = x) := |(x|exp( iU P ) |0)| 2 (5.5) 

The output string x is simply a measurement result, regarded as a (probabilistic) 
sample from the vector space FJJ, where again n counts the number of qubits men- 
tioned in Hp. 

Our interest lies primarily not in the decision languages that polytime-bounded ma- 
chines can decide with access to such an oracle {i.e. BPP ), but in the wider 
notions of computing that go beyond mere decision languages, to encompass other 
computational concepts such as interactive games. As an historical aside, we note 
that Simon [66] wrote about algorithms that use nothing more than an oracle and 
an Hadamard transform, and which therefore could be described as 'temporally un- 
structured'. However, his notion of 'oracle' was one tailored for a universal quantum 
architecture, being essentially an arbitrarily complex general unitary transforma- 
tion, and since there is no natural notion of one of these within our 'temporally 
unstructured' paradigm, the oracle of Simon's algorithm cannot be simulated by an 
XQV oracle. 
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5.2.3 Adaption description 

We have said that an X-program contains no 'temporal structure' because it makes 
no difference the order in which the Hamiltonian terms of line (5.3) are applied. It 
is perhaps interesting to note that if we reintroduce temporal structure by allowing 
the elements of an X-program to be subject to classical parity-control as discussed in 
§3.3.4, allowing also for intermediate computational-basis measurements as occurs 
within the measurement-based quantum computational paradigm of [51, 54] &c, 
then the full power of BPP classical computing can be recovered. This is because 
there is a simple gadget 3 on one qubit for emulating a classical And gate : see 
Fig. 5.1. (Whereas a BPP circuit composed of gates of type Not and C-Not can 
be simulated entirely within the parity logic that feeds classical data forward from 
one X-program to the next, to simulate a BPP gate of type And, for example, it is 
necessary to apply a three-element single-qubit program where the three elements are 
controlled respectively by the two inputs to the And gate and their parity, thereby 
potentially increasing the overall depth of the simulation each time an And gate is 
simulated.) 



\Q)-VX — </X — /X f — ) 



-e 



Figure 5.1: Using 'classical adaption', with parity-control of 'diagonal' gates, to simulate 
a classical And gate. The green wires denote classical control signals to y/X gates and the 
classical measurement outcome. 



The power of post-selection 

Recall that Aaronson [1] showed that if one employs post-selection (c/. §3.2) of the 
measurement results of a BQP circuit, the computational power is boosted enor- 
mously to encompass all of PP. Post-selection amounts to asking for some of the 
measurement outcomes to take specific values, even if those values are exponen- 
tially unlikely, before using the remaining measurement values to make a decision. 
We note here that the same results hold true for circuits (or X-programs) merely 
implementing ZQV, that is, using one call to an XQP oracle. 

Definition 5.2.4. The class BPP /( ^ P W w n\i post- selection is denoted PostlQP. 
It consists of all languages of the form Postc s <zCc^P) ^ n ^ e notation of Defini- 
tion 3.2.5, where £5 and Cc are both in BPP and where V is a (uniform) fam- 



3 The design of this gadget is based on a similar concept developed by Daniel Browne, discussed 
in various recent conferences. 
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ily of probability distributions given by a uniform, family of X-programs via Defini- 
tion 5.2.3. 

Proposition 5.2.5. PostlQP = PP. 

Proof. It suffices to show that PostBQP C PostlQP, the rest already being es- 
tablished. To see this, consider any general BQP circuit that is composed of H 
gates together with Z, A(Z), and A 2 (Z) gates (this set being BQP-universal), some 
of whose qubits are output at the end and some of whose qubits are post-selected. 
Assume without loss of generality that the input to the circuit is |0) n and that 
the output and post-selection is in the computational basis. Also assume without 
loss of generality that the first and last gate on every qubit is H. The remaining 
H gates which are neither first nor last on a qubit line can be replaced with the 
post-selection gadget of Fig. 5.2. 




Figure 5.2: An Hadamard gadget, for replacing an Hadamard gate in a context where 
post-selection is admitted. The lower qubit is a primal qubit, the upper one is an ancilla. 
The red dot after the bra symbol denotes post-selection of that outcome. 



This gadget, replacing an H gate on a (primal) qubit acts as follows : it introduces 
an ancilla qubit in state |0), applies an H gate to it, swaps it with the primal 
qubit, applies a Controlled- Z gate between the two, applies another H gate to the 
ancilla, and then post-selects for that ancilla to be in state |0). If this gadget is used 
everywhere to remove the 'internal' H gates, then we are left with a (post-selected) 
circuit having no inherent temporal structure. This is because all the 'internal' gates 
are now diagonal, and therefore mutually commutative. (The swap operations are 
to be regarded passively as relabelings, rather than as actively as gates.) Regarding 
the remaining H gates at the beginning and end of each qubit as passive changes 
of basis, it is then functionally equivalent to a circuit that can be rendered as an 
X-program (post-selected) in which all terms in the Hamiltonian Hp affect at most 
three qubits and have values 6 that are some multiple of 7r/8. 

Re-expressing this idea in the usual 'calculus' of unitaries, let d denote the qubit 
on which takes place the H operation that we wish to remove, let E denote the 
remaining qubits, and let a denote the ancilla qubit that we introduce for emulating 
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the Hadamard gate. Then the transformation is given by 
QB-V dtE -H d -U d , E -Q B -\0) d \O) E 

H- (Q\ a ■ QB ■ Vd,E ■ K{Z d ) • U a , E ■ Q B ■ \0) a \0) d \0) E , 

where U and V denote the remaining parts of the circuit and Qb denotes the 
Hadamard gate on every qubit. Post-selection then requires qubit a to end up 
in state |0) a . 

To see why this emulates an Hadamard transform, we need only compare H d \ip) d 
with A a (Z d ) \^) a \+) d for arbitrary |V>> = a |0) +fi |1). So H d \ij)) d oc (a + 0, a - /?), 
written as a vector of amplitudes, while A a (Z d ) \ip) a \+) d oc (a, a, (3, — j3). We then 
apply H a for transferring the ancilla back into the basis in which it is post-selected, 
obtaining the vector (a + /3, a — /?, a — f3, a + /?), so that post-selection leaves (a + 
/3, a — j3), as required. D 

This line of reasoning indicates that it is unlikely that a classical computer would be 
able efficiently to tell us everything we might care to learn about the distribution of 
the "IQV random variable" , X, of line (5.5). A similar line of reasoning is employed 
in [26], where it is shown that exactly simulating constant depth quantum circuits 
classically is hard, but that any family of constant-depth quantum circuits that 
decides a language with zero failure probability is efficiently classically simulable. We 
remark that the analogous Proposition (4.1.11) was shown to hold for the J-S oracle 
of Chapter 4, for essentially the same reason : that post-selection supervenes classical 
adaption. The proof is effectively formed by substituting Fig. 4.1 for Fig. 5.2. 

5.3 Physical Considerations 

The lack of temporal structure within the Hamiltonian of an X-program, owing to the 
fact that its description is in terms of an Abelian group, leads one to wonder whether 
it might not be possible to solve the engineering challenge of implementing an XQV 
oracle using techniques that would be unsuitable for general purpose quantum com- 
puting. If this were true, then the XQP framework could be useful for describing 
particularly 'easy' quantum algorithms. But for this idea to have any chance of 
making sense, it is preferable that an architecture exist whereby the physical Hamil- 
tonian required does not involve terms of more than two qubits. General X-programs 
do not have this property. For this reason, we also define Graph-programs. Browne 
and Briegel [16] wrote about CD-decomposition, which is the first rigorous treatment 
that we know of that explicitly links graph state temporal depth with commutativity 
of Hamiltonian terms used to simulate a graph state computation. It is from their 
terminology that we have chosen the Chapter title. 
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5.3.1 Graph-programs 

Graph state computing architectures are popular candidates for scalable fully uni- 
versal quantum processors [54, 51]. Here, of course, we are concerned not with 
universal architectures per se, but with the appropriate restriction to 'unit time' 
computation : the lowest levels of the CD hierarchy. Another way to construct an 
TQV oracle uses so-called "Graph- programs" , named since such a program is most 
easily described as the construction of a graph state followed by a series of mea- 
surements of the qubits in the graph state in various bases [16] . A graph state has 
qubits that are initially devoid of information, but which are entangled together ac- 
cording to the pattern of some pre-specified graph, using the Gg operator of §2.2.2. 
Unlike universal graph state computation, our Graph-programs do not admit any 
adaptive feed-forward, which is to say that all measurement angles must be known 
and fixed at compile-time, so that all measurements can be made simultaneously 
once the graph state has been built. In this sense, the 'depth' of a Graph-program 
is 1. There is a sense in which one may regard such a program as scarcely involving 
dynamics at all. 

Definition 5.3.1. A "Graph-program" is specified by giving an undirected graph 
Q(V,£) (usually bipartite), with labelled and distinguished vertices. The vertex set 
is denoted V , of cardinality n, and for each v £ V there is to be given an element of 
SU(2); R v £ SU(2). The edge set is denoted £,. We associate to it the probability 
distribution 



P(X = x) 



<xi n^-n a ^ i+" 

vev (u,v)e£ 



2 



(5.6) 



To execute a Graph-program is to sample from this distribution. 



To implement the program, a qubit is associated with each vertex and is initialised 
to the state |+) in the Hadamard basis. Then a Controlled- Z Pauli gate is applied 
between each pair of qubits whose vertices are a pair in £. Since these Controlled- Z 
gates commute, they may be applied simultaneously, at least in theory. This process 
is equivalent to application of the Gg constructor, discussed in §2.2.2. Finally, each 
vertex qubit v is measured in the direction prescribed by its label R v , returning a 
single classical bit. Clearly the order of measurement doesn't matter, because the 
measurement direction is prescribed rather than adaptive. Hence a sample from F'J? 
(a bit-string) is thus generated as the total measurement result. 
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5.3.2 Emulation of X-programs 

We will show how Graph-programs can simulate the output of X-programs if a 
little trivial classical post-processing of the measurement results is allowed. (Graph- 
programs would seem to be a little more general than X-programs : it does not seem 
possible to emulate an arbitrary Graph-program with an X-program, because gen- 
erally Graph-programs use gates that are neither in the Clifford group nor diagonal 
in the Hadamard basis.) 

Proposition 5.3.2. Any X-program can be efficiently simulated by a Graph-program. 
That is, a device for sampling from general distributions of the form at line (5.6) 
can emulate an XQP oracle, if classical post-processing is permitted, such that the 
size of the description of the Graph-program is polynomially bounded by the size of 
the description of the X-program. 

Proof. Suppose we're given an X-program, P, thought of as a function from F2 to 
R, and also as a list of elements { (0 p ,p) : p£?C Fj }. Let V be the disjoint 
union of [l..n] and P C Fjj, so that the graph state used to simulate the program will 
have one primal qubit/vertex for each qubit being simulated (that is, n of them), 
plus one ancilla qubit/vertex for each program element p G P. Write #P for the 
number of elements in P. The cardinality of V is then n + #P. 

We build a bipartite graph by connecting some of the primal vertices to some of 
the ancilla vertices. For j G [l..n] and p G P, let (j, p) G £ exactly when the jth 
component of p is a 1. Now let Rj be the Hadamard element (H) for all primal 
qubits, so that all primal qubits are measured in the Hadamard basis. And let 
P p = exp(i9pX), so that every ancilla qubit is measured in the (YZ)-plane at an 
angle specified by the corresponding program element. See Fig. 5.3 for an example 
with n = 4 primal qubits and #P = 7 ancillae. 

/ 1 \ 
110 
110 
P = 10 11 
10 1 
10 

V 1 / 

Figure 5.3: On the left, P is an example of a description of an X-program, given as a 
matrix (action values 9 not shown in the matrix). The graph state on the right — where 
the four lighter nodes represent primal qubits and the seven darker nodes represent ancilla 
qubits — can be used to simulate this X-program, as described in the text. 

If the resulting Graph-program is executed, it will return a sample vector x G F!J * 
for which the n bits from the primal qubits are correlated with the jfP bits from 
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the ancillas in a fashion which captures the desired output, (though these two sets 
separately — i.e. marginally — will look like flat random data). To recover a sample 
from the desired distribution, we simply apply a classical C-Not gate from each 
ancilla bit to each neighbouring primal bit, according to £, and then discard all the 
ancilla bits. 

One can use simple circuit identities to check that this produces the correct distri- 
bution of line (5.5) precisely. For example, if we merge the post-processing C-Not 
gates into the quantum calculus description of the state, then we see 



n a p(^) ■ n et9pXp • n ^ • n **&) ) i+>r + ; s 

dj,p)es J \peP J \jg[i..n] / \(j, P )e£ 



which is equivalent with 




But e ldpXp conjugated by A p (X,) is simply e ™v x v x i ; so the state above may be 
rewritten using the notation of line (5.3) as 



IF XpHp | 0)& | +> w. 

vP eP / 

But |+) p is an eigenvector of X p , so if we ignore the ancilla states (now separate 
anyway), we are left with 




as required for the X-program. □ 

We note in passing that the kinds of graph called for in this particular reduction 
are not the usual cluster state graphs that correspond to a regular planar lattice 
arrangement as normally used in measurement-based quantum computation. The 
bipartite graphs described in the reduction here will usually be far from planar, for 
the X-programs that we'll be considering, having a relatively high genus. 

5.3.3 Constructing graph states 

By decomposing the graph constructor Gg into individual gates (c/. §2.2.2), it is 
clear that graph states may be constructed from polynomially many single-qubit 
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rotations and two-qubit interactions. Moreover, a graph state may be constructed 
without inherent temporal complexity, because there is no essential reason requir- 
ing one edge of the graph (one aspect of entanglement) to be prepared before any 
other. 

One might argue that a physical implementation of a graph state construction pro- 
cess could require time on the order of the valency of the graph in question, because 
it might be impractical to have an individual qubit engage in more than one entan- 
gling gate at a time. However, even if this latter argument turns out to be relevant 
for architectures of interest (cf. [35]), it is still the case that the circuit-depth of 
graph state construction is merely logarithmic in the valency of the graph. This is 
because for the cost of some extra ancilla qubits, one can employ a binary tree of 
C-Not gates to 'fan out' each qubit vertex of the graph onto n 'identical' physical 
qubits that together represent the qubit associated to a graph vertex. Then the 
logical state |+) is rendered physically as |00 ... 0) + 1 11 ... 1) on these qubits. The 
entangling operation Eg can then be rendered using a circuit of depth one, since each 
Controlled- Z gate impinging on the vertex can now use a distinct physical qubit. Of 
course, the 'fan-out' procedure must then be reversed after the vertices have been 
entangled and before they are measured, again using a binary tree of C-Not gates, 
at a cost of logarithmic circuit depth. This trick was pointed out by Moore and 
Nilsson in [44], who introduced the class QNC, thereby effectively showing that 
TQV circuits are renderable in quantum logarithmic parallel time : 
Proposition 5.3.3. BPP XQP C BPP QNC \ 



5.4 Mathematical Analysis 

This section provides mathematical background for analysing X-programs, and hence 
TQV computing, for the later algorithmic constructions. 

5.4.1 Computational paths 

Using the idea of counting computational paths, we can simplify the expression for 
an TQV output probability distribution as follows. 

Lemma 5.4.1. The probability distribution given at line (5.5) (repeated below) is 
equivalent to the one at line (5. 7) given below. (Here P denotes an X-program having 
k elements, and we use the same symbol P to denote the k-by-n binary matrix whose 
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rows are the p vectors of the X-program under consideration.) 
F(X = x) := |(x|exp( iH P )|0)| 2 

2 J I cos # p 1 J i sin ( 

aGF§ : a-P=x P : a P =0 P : o P =l 



(5.7) 



Proof. Using the fact that the Hamiltonian terms in an X-program all commute, we 
can think of the quantum amplitudes arising in an X-program implementation as a 
sum over paths, 

(x| ]~[ cos6> p + isindp ]~[ Xj J |0) 

p \ r-Pi =1 J 

n 

= (xi Yi n c ° s ^p n isin °p n x j a ' p)j i°>> ^ 

a6F| P : a p=° P : a p =1 i =1 

and hence derive a new form for the probability distribution accordingly. □ 

5.4.2 Binary matroids and linear binary codes 

Before proceeding further, it behoves us to establish the link that these formulae have 
with the (closely related) theories of binary matroids and linear binary codes. 

Codes 

Definition 5.4.2. A linear binary code, C, of length k is a (linear) subspace of the 
vector space F§, represented explicitly. The elements ofC are called codewords, and 
the Hamming weight wt(c) E [0..k] of some c £ C is defined to be the number of Is 
it has. The rank of C is its rank as a vector space. 

Linear binary codes are frequently presented using generator matrices, where the 
columns of the generator matrix form a basis for the code. If P is a generator matrix 
for a rank n code C, then P has n columns and the codewords are {P-d T : d £ Fg}. 
Fig. 5.3 includes an example of a rank n = 4 code of length k = 7. 

Another nice way to conceptualise the TQP oracle is as a device that forms a uniform 
coherent superposition over codewords of a code, before measuring that state using 
a locally skewed basis. 
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Matroids 

There are many different, isomorphic, definitions for matroids, (see [48]). We shall 
adopt the following definition. 

Definition 5.4.3. A k-point binary matroid is an equivalence class of matrices 
defined over F2, where each matrix in the equivalence class has exactly k rows, and 
two matrices are equivalent (written M\ ~ M2) when for some (k-by-k) permutation 
matrix Q, the column- echelon reduced form of M\ is the same as the column- echelon 
reduced form of Q ■ M2. Here we take column- echelon reduction to delete empty 
columns, so that the result is full-rank. Hence the rank of a matroid is the rank of 
any of its representatives. 

Less formally, this means that a binary matroid is like a matrix over F2 that doesn't 
notice if you rearrange its rows, if you add one of its columns into another (modulo 2), 
or if you duplicate one of its columns. This means that a matroid is like the generator 
matrix for a linear binary code, but it doesn't mind if it contains redundancy in its 
spanning set (i.e. has more columns than its rank) and it doesn't care about the 
actual order of the zeroes and ones in the individual codewords. To be clear, when 
thinking of a matrix P, we are simultaneously thinking of its columns as the elements 
of a spanning set for a code, and its rows as the points of a corresponding matroid, the 
elements of an X-program. Because one cannot express a matroid independently of a 
representation, we consistently conflate notation for the matrix P with the matroid 
P that it represents. 

There is a definition in the literature for weighted matroids, which in this context 
would correspond to allowing different values for different terms in the Hamiltonian 
of an X-program. While mathematically (and physically) natural, such considera- 
tions would not help with the clarity of our presentation, and in most of what follows 
we are concerned only with X-programs for which all the (non-zero) 9 P values are 
the same. 

Weight enumerator polynomials 

Perhaps the main structural feature of a binary matroid is its weight enumerator 
polynomial. 

Definition 5.4.4. If the k rows of binary matrix P establish the points of a k- 
point matroid, then the weight enumerator of the matroid is defined to be the weight 
enumerator of the k-long code C spanned by the columns of P, which in turn is 
defined to be the bivariate polynomial 

WEP c (x,y) = J2x wt{c) y k - Wt{c \ (5.9) 

cec 
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This is well-defined, because the effect of choosing a different matrix P that rep- 
resents the same binary matroid simply leads to an isomorphic code that has the 
same weight-enumerator polynomial as the original code C. The exact evaluation 
of an arbitrary weight-enumerator is hard for the polynomial hierarchy : see [70] 
for more on the computational complexity of approximating weight-enumerators. 
This suggests that a search for any Y ®- computing method for evaluating arbi- 
trary weight-enumerators might lead one day to a way to put XQV — and hence also 
BQP — outside of the polynomial hierarchy. 

Bias in probability distributions 

Definition 5.4.5. If X is a random variable taking values in F£, and s is any 

element o/FJ?, then the bias 4 o/X in direction s is simply the probability that X-s T 
is zero, i.e. the probability of a sample being orthogonal to s. 

Let us now consider an X-program on n qubits that has constant action value 9, 
whose Hamiltonian terms are specified by the rows of matrix P, as discussed earlier. 
Then we can use Lemma 5.4.1 with the definition above to obtain the following 
expression of bias, for any binary vector s £ FrJ : 



F(X-s T = 0) 



E 



E 

a : a-P=x 



(cos( 



\k—wt(a) 



(isinl 



\wt(a) 



. (5.10) 



Since it would be nice to interpret this expression as the evaluation of a weight 
enumerator polynomial, we are led to define P s to be the submatrix of P obtained 
by deleting all rows p for which p-s T = 0, leaving only those rows for which p-s T = 1. 
We call the number of rows remaining n s . (Note, n s is here being used for the length 
of the code C s in deference to the usual practice of reserving the letter n for code 
lengths. This n s is counting a number of rows, and should not be confused with the 
n used earlier for counting a number of columns.) This in turn leads to the code 
C s being the span of the columns of P s , and likewise a submatroid (also called a 
matroid minor) is correspondingly defined. 

Theorem 5.4.6. For constant- action X-programs, the bias expression F(X-s = 0) 
for the random variable X of line (5.5) depends only on the action value 9 and 
(the weight enumerator polynomial of) the n s -point matroid P s , as defined above. 
Moreover, if C s is a binary code representing the matroid P s , then the following 
formula expresses the bias : 



P(X • s T = 0) 



E 



c~C s 



COS' 



(#(n s - 2-wt(c)) ) 



(5.11) 



4 Note that this definition of bias — used throughout this Chapter — is a little different from the 
definition used in §3.1.1; simply to help shorten some of the formulas. 
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Proof. To derive line (5.11) from line (5.5) in the case that the value 9 is constant, 
proceed as follows. Throughout, the variable p ranges over the rows of the binary 
matrix P, which are the program elements of an X-program. 



P(X = x) 



<x| exp[ J\-0 p J] X 3 ) |0") 



2 -nJ2(-l)™ T ( a \ exp[ £> p [] Z j ) J>) 



E R 



E. 



a,d 



-ir aT expf i0^2(-ir aT 

(-l) x - dT exp f i^(-l)P- T (l - (-I)*- 



(5.12) 



On the second line we made a change of basis, so as to replace the Pauli X operators 
with Pauli Z ones. 



P(X-s T = 0) 



1 + (-l) x ' sT ) d T^E P (-P p - T ( i-(-i)P' dT 



2 n E x [ {x • s T = 0} • P(X = x 
2 n lE a ,d,x 

2 n E, d 



{d = 0} + {d = s} ) WEb( _i)p-* t ( i_(_i)P-d 



1 + E 8 



tfE D (-ir r (i-(-ir 7 



(5.13) 



These transformations are conceptually simple but notationally untidy. 



2 ■ P(X • s T = 0) - 1 



-j + E P (-ir T (i-(-ir T 



5>«*P.I i = 2^ (-l)P- 

j \ p : p-s T = l 

JV# P ( j = 2( n 8 - 2 ■ urt(c) ) | c~ C s ) 

i 
^2 cos(26»(n s - 2w)) ■ P ( w = wt(c) | c ~ C a ) . (5.14) 
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Here we have used the standard Fourier decomposition of a periodic function, and 
used the fact that the function is known to be real. The variable substitution at the 
third line was c = P s ■ a T , understood in the correct basis. At the fourth line it was 
w = (2n s — j)/4. Then 

P(X-s T = 0) = J^cos 2 ( 6{n s -2w) ) -P( w = wt(c) | c ~ C s ) 
iu=0 

= E c ^ Cs [ cos 2 ( 6(n s - 2-wt{c)) ) ] . (5.15) 

D 

To recap, this means that if we run an X-program using the action value 6 for all 
program elements, then the probability of the returned sample being orthogonal to 
an s of our choosing depends only on 6 and on the (weight enumerator polynomial 
of the) linear code obtained by writing the program elements p as rows of a matrix 
and ignoring those that are orthogonal to s. 

We emphasise at this point the value of Theorem 5.4.6 : it means that for any 
direction s £ FrJ, the bias of the output probability distribution from an X-program 
(P, 9) in the direction s depends only on 6 and the rows of P that are not orthogonal 
to s, and not at all on the rows of P that are orthogonal to s. Moreover, the bias in 
direction s depends only on the matroid P s , and not on the particular matrix P s that 
represents it. That is, directional bias (definition 5.4.5) is a matroid invariant. 

Note that whenever A is an n-by-n invertible matrix over F2, then 

p s T = v-A-A~ l -s T = (p • A) ■ (s • A~ T ) T , (5.16) 

so any invertible column operation on matrix P accompanies an invertible change of 
basis for the set of directions of which s is a member. Note also that appending or 
removing an all zero column to P has the effect of including or excluding a qubit on 
which no unitary transformations are performed. Thus if P s is a submatroid of P by 
point-deletion, as described earlier, then if the invertible column transformation A is 
applied to the matrix P that represents the matroid P, then the same matroid that 
was formerly called P s is still a submatroid, but now it is represented by the matrix 
P s .a- t - Likewise, appending or removing a column of zeroes to P necessitates 
an extra zero be appended or removed from any s that serves as a direction for 
indicating a submatroid. This is purely an issue of representation, and we consider 
that intuition about these objects is aided by taking an 'abstractist' approach to the 
geometry, thinking of the matroid as the fundamental object. 
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5.4.3 Matroids, unitaries, Hamiltonians, probability distributions 

We have seen that a fc-point matroid P with a constant action value 9 defines a 
Hamiltonian Hp on k qubits (up to qubit ordering), which in turn defines a unitary 
exp(iHp), and thence a random variable, X on F§, having an interesting probability 
distribution. Yet it is possible that two different matroids could give rise to the same 
probability distribution, because two different Hamiltonians can give rise to the same 
unitary map. 

Consider the case whereby 9 = ir/8. Notate the Pauli X gate alternatively as 1 — 2x, 
so that x = (1— X)/2 is represented by an integral matrix in the diagonal basis. Then 
any term ^X a X), ■ ■ ■ X c of Hp can be expanded into many terms by multiplying out 
the expression ^(1 — 2x a )(l — 2xb) • • • (1 — 2x c ). We need only keep the monomial 
terms of degree 3 or less in the x variables, since the higher order terms will have 
coefficients a multiple of 2ir, and will therefore not be 'seen' in the resulting unitary 
map. Note that this expansion and truncation will cause the number of terms to 
'explode' only polynomially, not exponentially. Now, rewriting each monomial back 
in terms of ir/8 and X variables, we end up with a Hamiltonian that has 3-qubit 
interactions at worst. The resulting matroid is possibly larger than the initial one, 
but it possesses a 'sparse' representative matrix whose every row has Hamming 
weight at most 3. 

This sort of trick can be useful in understanding the complexity oiXQV algorithms, 
and in tailoring designs to particular architectures. It also puts an equivalence 
class structure on the set of all (unweighted) binary matroids, which may be of 
independent interest. 

5.4.4 Entropy, and trivial cases 

Because it will be useful later, we will define the Renyi entropy (collision entropy) 
of a random variable, before exemplifying Theorem 5.4.6 and proceeding with the 
main construction of this Chapter. 

Definition 5.4.7. The collision entropy, S2, of a discrete random variable, X ; 
measures the randomness of the sampling process by measuring the likelihood of two 
(independent) samples being the same. It is defined by 



-s 2 



E p ( x 



X 



E s 



2P(X-s T = 0)-l 



(5.17) 



And so there are a few 'easy cases' for our X random variable of Lemma 5.4.1 that 
should be highlighted and dismissed up front : 
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Lemma 5.4.8. For a constant- action X-program, if 9 is. . . 

• . . . a multiple of it, then the returned sample will always be 0. The collision 
entropy will be zero. 

• . . . an odd multiple of n/2, then the returned sample will always be XlpepP - 
The collision entropy is zero. 

• . . . an odd multiple of vr/4, then the collision entropy need not be zero, but the 
probability distribution will be classically simulable to full precision. 

Proof. In the first case, considering line (5.7), there is then a sin(7r) = factor in 
every term of the probability, except where x = 0. 

In the second case, considering again line (5.7), there is then a cos(-7r/2) = factor 
in every term, except where all the p vectors are summed together to give x. The 
same can also be deduced from Theorem 5.4.6, which implies that x will be surely 
orthogonal to s exactly when n s is even, i.e. exactly when an even number of rows 
of P are not orthogonal to s, i.e. exactly when ^ gP p is orthogonal to s. 

For the third case, if 9 is an odd multiple of vr/4, then all the gates in the program 
would be Clifford gates. By the Gottesman-Knill theorem there is then a classically 
efficient method for sampling from the distribution, by tracking the evolution of the 
system using stabilisers, &c. □ 

For other sufficiently different values of the action parameter, classical intractibility 
becomes a plausible conjecture (c/. [55]). In particular, the remainder of this Chap- 
ter will specialise to the case 9 = 7r/8, since we are able to make all our points about 
the utility of TQV computing even with this restriction. 

Conjecture 5.4.9. The expected collision entropy of the probability distribution of a 
randomly selected X-program of width n, with constant action tt/8, scales as n— 0(1). 

This conjecture is perhaps not directly relevant to the 'hardness' of thelQV" paradigm 
itself, but it is implicitly relevant to the design of the kind of hypothesis test that 
can legitimately be used to constitute the final part of the interactive proof game 
discussed next. It is future work to prove this conjecture and gain a better under- 
standing of random matroids in the context of quantum computation. 

5.5 Interactive Protocol 

One would naturally like to find some 'use' for the ability to sample from the proba- 
bility distribution that arises from a temporally unstructured quantum computation; 
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a 'task' or 'proof that can be completed using e.g. an X-program, which could pre- 
sumably not be completed by purely classical means. In this section we develop 
our main construction towards that goal : a two-player interactive protocol game, 
with classical message passing, in which a Prover uses an XQP oracle simply to 
demonstrate that he does have access to an XQP oracle. 

Perhaps such algorithms constructed in BPP s will be found to be the simplest 
algorithms for demonstrating quantum computing, provided it is believed that they 
cannot be efficiently simulated classically. Certainly such algorithms will stand a 
good chance of being much simpler, and requiring far fewer qubits, than are the 
algorithms in BPP (or FH2, cf. Chapter 4) which are for solving reasonably 
hard instances of certain NP problems. 

5.5.1 At a glance 

This section gives a brief overview of our protocol. Alice plays the role of the 
Challenger/Verifier, while Bob plays the role of the Prover. There are three aspects 
of design involved in specifying an actual "Alice & Bob" game : 

A) a code/matroid construction, for Alice to select a problem P, to send to Bob; 

B) an architecture or technique by which Bob is able to take samples from the 
XQV distribution of the challenge he receives, to send back to Alice; 

A') an hypothesis test for Alice to use to verify (or reject) Bob's attempt. 

Alice uses secret random data to obfuscate a 'causal' matroid P s inside a larger 
matroid P, and the latter she publishes (as a matrix) to Bob. Bob interprets matrix 
P as an X-program to be run several times, with 9 = ir/8. He collects the returned 
samples, and sends them to Alice. Alice then uses her secret knowledge of 'where' in 
P the special P s matroid is hidden, in order to run a statistical test on Bob's data, 
to validate or refute the notion that Bob has the ability to run X-programs. 

This application is perhaps the simplest known protocol, requiring (say) ~ 200 
qubits, that could be expected to convince a skeptic of the existence of some com- 
putational quantum effect. The reason for this is that there seems to be no classical 
method to fake even a classical transcript of a run of the interactive game between 
Challenger and Prover, without actually being (or subverting the secret random 
data of) the classical Challenger. In this sense, the verification may be said to be 
"device-independent" . 

In §5.6 we make an analysis 5 of some (best-known) classical cheating strategies for 
Bob, though these are shown to be insufficient in general. 



The details of precisely how to make a good hypothesis test are omitted from this work for the 
sake of brevity, but sourcecode is available. 
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5.5.2 More details 

Consider therefore the following game, played between Alice and Bob. Alice, also 
called the Challenger/Verifier, is a classical player with access to a private random 
number generator. Bob, also called the Prover, is a supposedly quantum player, 
whose goal is to convince Alice that he can access an XQV oracle, i.e. run X- 
programs. The rules of this game are that he has to convince her simply by sending 
classical data, and so in effect Bob offers to act as a remote XQV oracle for Alice, 
while Alice is initially skeptical of Bob's true XQV abilities. 

Alice's challenge 

The game begins with Alice choosing some code C s that has certain properties 
amenable to her analysis. In particular, she chooses the code C s in such a way 
that all the known classical cheating strategies of §5.6 are defeated. Details are 
given in §5.5.3. 

She then finds a matrix P s whose columns generate the code (not necessarily as a 
basis) , and ensures that there is some s that is not orthogonal to any of the rows of 
P s . The vector s should be thought of not as a structural property of the code C s , 
but as a secret 'locator' that she can use to 'pinpoint' P s even after it has later been 
obfuscated. 

Obfuscation of P s is achieved by appending arbitrary rows that are orthogonal to 
s. This gives rise to matrix P. The matroid P has P s as a submatroid, in the sense 
that removal of the correct set of rows will recover P s . Alice publishes to Bob a rep- 
resentation of matroid P that hides the structure that she has embedded. Random 
row permutations are appropriate, and reversible column operations likewise leave 
the matroid invariant (though the latter will affect s and must therefore be tracked 
by Alice). 

Bob's proof 

Bob, by hypothesis being capable of sampling from an XQV distribution, may inter- 
pret the published P as an X-program, to be run with the (constant) action set to 
9 = 7r/8 (say) . He will be able to generate random vectors which independently have 
the correct bias in the direction (unknown to him) s, i.e. the correct probability of 
being orthogonal to Alice's secret s, in accordance with Theorem 5.4.6. Although he 
may still be entirely unable to recover this s from such samples, he nonetheless can 
send to Alice a list of these samples as proof that he is 'XQ"P-capable'. Note that 
Bob's strategy is error-tolerant, because if each run of the XQP oracle were to use a 
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'noisy' value, then the overall proof that he generates will still be valid, providing 
the noise is small and unbiased and independent between runs. Note also that Bob 
can manage several runs in one oracle call, if desired, simply by concatenating the 
matrix P with itself diagonally. That is to say, we even avoid classical temporal 
structure {adaptive feed-forward) on Bob's part, and so can regard his part of the 
protocol as lying within p^-S'H 1 !. 

Alice's verification 

Since Alice knows the secret value s, and can presumably compute the value P(X • 
s T = 0) from the code's weight enumerator polynomial (see Theorem 5.4.6 and recall 
that she is free to choose any C s that suits her purpose), it is not hard for her to 
use a hypothesis test to confirm that the samples Bob sends are commensurate with 
having been sampled independently from the same distribution that an X-program 
generates. That is to say, Alice will not try to test whether Bob's data definitely 
fits the correct XQV distribution, but she will ensure that it has the particular 
characteristic of a strong bias in the secret direction s. This enables her to test the 
null hypothesis that Bob is cheating, from the alternative hypothesis that Bob has 
non-trivial quantum computational power. 

This requires belief in several conjectures on Alice 's part. She must believe that there 
is a classical separation between quantum and classical computing; in particular 
that an XQV oracle is not classically efficiently approximately simulable — at least 
she must believe that Bob doesn't know any good simulation tricks. And she must 
believe that her problem is hard — at least she should believe that the problem of 
identifying the location of P s within P is not a BPP problem — on the assumption 
that the matroid P s is known. 

If he passes her hypothesis test, Bob will have 'proved' to Alice that he ran a quan- 
tum computation on her program, provided she is confident that there is no feasible 
way for Bob to simulate the 'proof data classically efficiently, i.e. provided she 
has performed her hypothesis test correctly against a plausibly best null hypothe- 
sis. 

Since Alice will test to see whether Bob's data has a strong bias in direction s 
(known only to her), she should first of all ensure that Bob's data does not have a 
strong bias in many directions simultaneously. This is easily done by removing all 
'short circuits' (i.e. all the empty rows and all the duplicate rows) from Bob's data, 
before testing it. Bob's data would not be expected to contain short circuits if the 
collision entropy of the distribution were high, and so Conjecture 5.4.9 is relevant in 
this sense : we believe that the collision entropy of an XQV distribution formed as 
described in §5.5.3 will indeed be large. 
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Significance 

This kind of interactive game could be of much significance to validation of early 
quantum computing architectures, since it gives rise to a simple way of 'tomographi- 
cally ascertaining' the actual presence of at least some quantum computing, modulo 
some relatively basic complexity assumptions, in a 'device-independent' fashion. In 
this sense it is to quantum computation what Bell violation experiments are to quan- 
tum communication. (We have serendipitously identified a construction for which 
the probability gap — quantum 85.4% over classical 75% — precisely matches the gap 
available from Bell's inequalities. See Lemma 5.5.2, §5.5.3.) 

Note that this 'testing concept' does not use the XQV paradigm to compute any 
data that is unknown to everyone (since Alice must know s if her verification is to 
work), nor does it directly provide Bob with any 'secret' data that could be used as 
a witness to validate an NP language membership claim (Bob doesn't really 'learn' 
anything from his experiments). Its only effect is to provide Bob with data that he 
can't apparently use for any purpose other than to pass on to Alice as a 'proof of 
XQ"P-capability. It is an open problem to find something more commonly associated 
with computation — perhaps deciding a decision language, for example — that can be 
achieved specifically with XQV oracle calls. 

5.5.3 Recommended construction method 

This section covers a specific example of a construction methodology (with im- 
plicit test methodology) for Alice, which we conjecture to be asymptotically secure 
(against cheating Prover) and efficient (for both Prover and Verifier). We emphasise 
here that it seems not unreasonable for Alice to believe that Bob can have no clas- 
sical cheating strategy so long as none such has been published nor proven to exist, 
and so our protocol may still serve as a demonstration (if not a proof) of a gen- 
uinely quantum computing phenomenon, despite the lack of proof of any simulation 
conjecture. 

Recipe for codes 

The family of codes that we suggest Alice should employ within the context of the 
game outlined above are the quadratic residue codes. These will be shown to have 
the significant property that there is a non-negligible gap between the quantum- 
and best-known-classical-approximation expectation values for the bias in the secret 
direction, both of which are significantly below unity. 

Consider a quadratic residue code over F2 with respect to the prime q, chosen so 
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that q + 1 is a multiple of eight. The rank of such a code is (q + 1)/2, and the length 
is q. A quadratic residue code is a cyclic code, and can be specified by a single cyclic 
generator. There are several ways of defining these, but the simplest definition for 
our purposes is as follows. 

Definition 5.5.1. The quadratic residue code (QR-Code) of prime length q, where 
8 divides q + 1, is a cyclic code over F2 generated by the codeword that has a 1 in 



the jth place if and only if the Legendre symbol I - I equals 1 (i.e. if and only if j is 
a non-zero quadratic residue modulo q). 

For example, if q = 7 (the smallest example) then the non-zero quadratic residues 
modulo q are {1,2,4}, and so the quadratic residue code in question is the rank-4 
code spanned by the various rotations of the generator (0, 1, 1, 0, 1, 0, 0, 0) T . 

Lemma 5.5.2. When q is a prime and 8 divides q + 1, then the quadratic residue 
code C of length q has rank (q + l)/2, and it satisfies 

.2/ *". 



Ec~c cos z I -(q - 2 ■ wt{c)) J = cos z (vr/8) = 0.854... (5.18) 

Moreover, it also satisfies 

p(cf -c 2 = I ci,c 2 ~c) = 3/4 = 0.75, (5.19) 

which is relevant to certain classical strategies (described in §5.6). 

Proof. The proof for the rank of the code is a well established result from classi- 
cal coding theory (see [43]). Other classical results of coding theory include that 
quadratic residue codes are a parity-bit short of being self-dual and doubly even. 
That is, the extended quadratic code, with length q + 1, obtained by appending a 
single parity bit to each codeword, has every codeword weight a multiple of 4 and 
every two codewords orthogonal. 

For line (5.18) this means that the (unextended) code has codeword weights which, 
modulo 4, are half the time and half the time —1. On putting these values into 
the left side of the formula, we immediately obtain the right side. For line (5.19) 
this means that in the (unextended) code, any two codewords are non-orthogonal if 
and only if they are both odd-parity, which happens a quarter of the time; whence 
the formula follows. □ 

The corollary here is that if Alice uses one of these codes for her 'causal' C s , then if 
Bob runs a series of X-programs (with constant = 7r/8) described by the (larger) 
matrix P, the data samples he recovers should be orthogonal to the hidden s about 
85.4% of the time (c/. Theorem 5.4.6); whereas if Bob tries to cheat using the 
classical strategy outlined in §5.6, then his data samples will tend to be orthogonal 
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to the hidden s only about 75% of the time (c/. Lemma 5.6.2). Alice's hypothesis 
test therefore basically consists in measuring this single characteristic, after having 
filtered duplicate and null data samples from Bob's dataset. We conjecture that Bob 
has no pragmatic way of boosting these signals, at least not without feedback from 
Alice or by expending exponential computing resources. (In fact, it would suffice for 
Alice to take any singly-punctured doubly-even code for this application.) 

Note that with exponential time on his hands, Bob could choose to simulate classi- 
cally an TQP oracle, in order to obtain a dataset with a bias in direction s that is 
approximately 85.4%. Alternatively, he could consider every possible s in turn, and 
test to see whether the matroid obtained by deleting rows orthogonal to his guess is 
in fact correspondent to a quadratic residue code, assuming he knew that this had 
been Alice's strategy. For these reasons, q should be fairly large in any practical 
example (say a few hundred), to preclude such exhaustive cheating strategies. 

Recipe for obfuscation 

Having chosen q as outlined above, and constructed a q-hy-(q + l)/2 binary ma- 
trix generating a quadratic residue code, Alice needs to obfuscate it. The easiest 
way to manage this process is not to start with a particular secret s in mind, but 
rather to recognise the obfuscation problem as a matroid problem, proceeding as 
follows : 

• Append a column of Is to the matrix : this does not change the code spanned 
by its columns since the all-ones (full- weight) vector is always a codeword of 
a quadratic code. Other redundant column codewords may also be appended, 
if desired. 

• Append many (say q) extra rows to the matrix, each of which is random, 
subject to having a zero in the column lately appended. This gives rise to 
a 2g-point matroid, and ensures that there now is an s such that the causal 
submatroid (quadratic residue matroid) is defined by non-orthogonality of the 
rows to that s. 

• Reorder the rows randomly. This has no effect on the matroid that the matrix 
represents, nor on the hidden causal submatroid. Nor does it affect s, the 
'direction' in which the sumbatroid is hidden. 

• Now column-reduce the matrix. There is no (desirable) structure within the 
particular form of the matrix before column-reduction, nothing that affects 
either codes or matroids. Echelon-reduction provides a canonical representa- 
tive for the overall matroid, while stripping away any redundant columns that 
would otherwise cost an unnecessary qubit, when interpreted as an X-program. 
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By providing a canonical representative, it closes down the possibility that in- 
formation in Alice's original construction of a basis for her causal code might 
leak through to Bob, which might be useful to him in guessing s. Rather more 
importantly, this reduction actually serves to hide s. (We can be sure by zero- 
knowledge reasoning that this hiding process is random : echelon reduction is 
canonical and therefore supervenes any column-scrambling process, including 
a random one.) 

• Finally, one might sort the rows, though this is unnecessary. The resulting 
matrix is the one to publish. It will have at least (q + l)/2 columns, since that 
is the rank of the causal submatroid hidden inside. 

5.5.4 Mathematical problem description 

What this method of obfuscation amounts to — mathematically speaking — is a situa- 
tion whereby for each suitable prime q, we start by acknowledging a particular (pub- 
lic) g-point binary matroid Q, viz the one obtained from the QR-Code of length q. 
Then an 'instance' of the obfuscation consists of a published 2g-point (say) binary 
matroid P; and there is to be a hidden 'obfuscation' subset O such that Q = P\0; 
and the practical instances occur with P chosen effectively at random, subject only 
to these constraints. (One could choose to make O bigger than q points if that were 
desired.) This has the feel of a fairly generic hidden substructure problem, so it 
seems likely that it should be NP-hard to determine the location of the hidden Q, 
given P and the appropriate promise of Q's existence within. More syntactically, we 
should like to prove that it is NP-complete to decide the related matter of whether 
or not P is of the specified form, given only a matrix for P. Clearly this problem 
is in NP, since one could provide Q in the appropriate basis as an explicit witness. 
We conjecture this problem to be NP-complete. 

Conjecture 5.5.3. The language of matroids P that contain a quadratic-residue 
code submatroid Q by point deletion, where the size of Q is at least half the size of 
P, is NP-complete under polytime reductions. 

These sorts of conjecture are apparently independent of conjectures about hardness 
of classical efficient XQP simulation, since they indicate that actually identifying 
the hidden data is hard, even (presumably) for a universal quantum computer. And 
even should this conjecture prove false, we know of no reason to think that a quan- 
tum computer would be much better than a classical one at finding the hidden Q, 
notwithstanding Grover's quadratic speed-up for exhaustive search. 

One might compare the structure of Conjecture 5.5.3 to that of the following im- 
portant theorem from graph theory : 
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Proposition 5.5.4. The language of graphs G that contain a complete graph K by 
vertex deletion, where the size of K is at least half the size of G, is NP-complete 
under polytime reductions. 

This is a classic result. See e.g. [49], where the problem in different guises is called 
'Clique' and 'Independent Set' and 'Node Cover'. Nonetheless, we know of no way 
to adapt the proof to fit Conjecture 5.5.3. 

Challenge 

It seems reasonable to conjecture that, using the methodology described, with a QR- 
code having a value q ~ 500, it is very easy to create randomised Interactive Game 
challenges for ZQ'P-capability, whose distributions have large entropy, which should 
lead to datasets that would be easy to validate and yet infeasible to forge without an 
XQP-capable computing device (or knowledge of the secret s vector) . We propose 
such challenges as being appropriate 'targets' for early quantum architectures, since 
such challenges 6 would essentially seem to be the simplest ones available (at least in 
terms of inherent temporal structure and number of qubits) that can't apparently 
be classically met. 

5.6 Heuristics 

The idea behind this two-party protocol is essentially a cryptographic one. There 
is an analogy to Public Key Cryptography, if one thinks of P as a public key, s 
as a secret key, and XQP as a kind of 'computational trapdoor'. In this section, 
we attempt to push the analogy a little further, describing the best-known classical 
'attack' strategies, and also give an accounting of our failure to find a decision 
language for proving the worth oiXQV. 

It is tempting to think that it would be desirable to have F(X • s T = 0) = 1, so that 
Bob stands a chance of finding many vectors that are surely orthogonal to s, thereby 
allowing for actually learning s via Gaussian elimination, thus genuinely computing 
something non-trivial. But we shall see (Theorem 5.6.3) that this is precisely the 
condition that makes s efficiently learnable using the classical techniques described 
below. This is why the code selected for the construction in §5.5.3 gave a bias of 
0.854 . . ., well below 1. For the same reason, it seems hard to find decision languages 
that plausibly lie in the difference BPP XQP \BPP. 



6 Accordingly, Michael Bremner and I have posted on the internet a challenge problem of size 
q = 487 (http://quantumchallenges.wordpress.com), to help motivate further study. This challenge 
website includes the source code (C) used to make the challenge matrix, and also the source code 
of the program that we would use to check candidate solutions, excluding only the secret seed value 
that we used to randomise the problem. 
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Directional derivatives 

Suppose we wish to construct a probability distribution that arises from some purely 
classical methods, which can be used to approximate our XQV distribution. Our 
motivation here is to check whether any purported application for an XQV oracle 
might not be efficiently implemented without any quantum technology. We proceed 
using the relatively ad hoc methods of linear differential crypt analysis. 

For the case 9 = vr/8, we will need to consider only second-order derivatives. The 
same sort of method will apply to the case 9 = 7r/2 rf+1 using dth order deriva- 
tives, but the presentation would not be improved by considering that general case 
here. 

In terms of a binary matrix/X-program P, proceed by defining 

/ : F£ -)■ Z/16Z, 

/(a) = ^(-l) p ' aT (mod 16), (5.20) 

peP 

and notate discrete directional derivatives as 

/ d (a) = /(a)-/(aed) (mod 16). (5.21) 

Consider also the second derivatives of /, given by 

/d,e(a) = / e (a) - /e(affid) (mod 16) 

= 2 Y ("l) P ' aT (l " (-l) p ' dT ) (mod 16) 

pSPe 

= 4 Y, ("l) P ' aT (mod 16) 
P eP d nP e 

- 4 Yl II ( 1 ~ 2a i ) ^ mod 16 ) 
P eP d nP e j-.pj=l 

= Y [4 + 8 ^ a^ | (mod 16), (5.22) 

peP d nP e \ j ■. P j=i ) 

each of which is quite patently a linear function in the bits (ai , . . . , a n ) of a, as a func- 
tion with codomain the ring Z/16Z, regardless of the choice of directions d, e. 

Lemma 5.6.1. With f defined as per line (5.20), and X the random variable of 
Lemma 5.4-1, for all s, 

P(X-s T = 0) = E a [cos 2 ( ^-/s(a) )], (5.23) 

and so the XQV probability distribution (in the case 9 = ir/8) may be viewed as a 
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function of f rather than as a function of P. 



Proof. Starting from the proof of Theorem 5.4.6, line (5.13), 
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(5.24) 



The second line above is obtained immediately from the first, using the definition 
of /. The third line follows because the expression is real-valued. The conclusion 
follows from a basic trigonometric identity, and linearity of the expectation operator. 

□ 



And so if there is a hidden s such that P(X-s T = 0) = 1, then that implies / s (a) = 
(mod 16) for all a. This is essentially a non-oracular form of the kind of function 
that arises in applications of Simon's Algorithm (c/. [66]), with s playing the role of 
a hidden shift. One could find linear equations for such an s if it exists, because it 
would follow immediately that / a (s) = / a (0) for all a, and hence /d,e( s ) = /d,e(0) 
for any directions d, e, which — by line (5.22) — is equivalent with 




0. 



(5.25) 



Classical sampling 

To make use of this specific second-order differential property, we need to analyse 
the probability distribution that a classical player can generate efficiently from it. 
Proceed by defining a new probability distribution for a new random variable Y, as 
follows : 



P(Y = y) 



P 



d,e 



peP d nP e 



(5.26) 



This may be classically rendered, simply by choosing d,e£ F^ independently with 
a uniform distribution, and then returning the sum of all rows in P that are not 
orthogonal to either d or e. 

Lemma 5.6.2. The classical simulable distribution on the random variable Y de- 
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fined in line (5.26) satisfies 

P(Y-s T = 0) = P(cf-c 2 = ! ci,c 2 ~C 8 ) (5.27) 



= i ( 1 + 2~ rank( p ?- Ps ) ) , (5.28) 

and so the bias ofY in direction s is a function of the matroid P s . 
Proof. Starting from line (5.26), 

f(y-s t = o) = y, Fd . e Ep = y) ( 5 - 29 ) 

y : ys T =0 \ peP d nP e / 

= P d , e J2 P • ^ = 

\ peP d nP e 

= F d , e ( wt( P ■ d T A P ■ e T A P ■ s T ) = (mod 2) ) 
= F die ( wt( P s ■ d T A P s ■ e T ) = (mod 2) ) 
= F d , e (d-P s T -P s -e T = 0). 

The wedge operator A here denotes the logical And between binary column- vectors. 

The first line of the Lemma follows from the direct substitutions ci = P s ■ d T , 
c 2 = P s ■ e T . The second line follows because unimodular actions on the left or right 
of a quadratic form (such as (P 8 • P s )) affect neither its rank nor the probabilities 
derived from it; so it suffices to consider the cases where it is in Smith Normal Form, 
i.e. diagonal, which are trivially verified. Since this expression is patently invariant 
under invertible linear action on the right and permutation action on the left of P s , 
it too is a matroid invariant. □ 

Correlation 

Thus we have established some kind of correlation between random variables X 
and Y. 

Theorem 5.6.3. In the established notation, for X-programs with fixed 6 = tt/8, 

P(X-s T = 0) = l => F(Y-s T = 0) = l. (5.30) 

Proof. By Theorem 5.4.6, the antecedent gives, for all c £ C s , n s = 2wt(c) (mod 8), 
where n s is again the length of the code C s . This entails that every codeword in C s 
has the same weight modulo 4, including the null codeword, so C s must be doubly 
even (which means every codeword has a weight a multiple of 4). It is easy to see 
that doubly even linear codes are self-dual (which means that a word is a codeword 
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if and only if it is orthogonal to every codeword). By Lemma 5.6.2, the consequent 
is obtained. □ 

The only counterexamples to the converse implication seem to occur in the trivial 
cases whereby the binary matroid P s has circuits of length 2, i.e. where P s has 
repeated rows. 

This random variable Y is the 'best classical approximation' that we have been 
able to find for X. (The intuition is that it captures all of the 'local' information 
in the function /, which is to say all the 'local' information in the matroid P, 
so that the only data left unaccounted for and excluded from use within building 
this classical distribution is the 'non-local' matroid information, which is readily 
available to the quantum distribution via the magic of quantum superposition.) 
There seems to be no other sensible way of processing P (or /) classically, to obtain 
useful samples efficiently, though it also seems hard to make any rigorous statement 
to that effect. 

Conjecture 5.6.4. The classical method defined in this section, yielding random 
variable Y, is asymptotically classically optimal (when comparing average-case be- 
haviour and restricting to polynomial time) for the simulation ofXQP distributions 
arising from constant- action 8 = tt/8 X-programs. 

This conjecture lends credence to the design methodology of §5.5. It means that 
if Bob wishes to cheat, using classical techniques only and not expending sufficient 
time to search exhaustively for s, then so far as we are aware, the best he can 
realistically hope to do is to use this random variable Y to make data items 75% of 
which ought to be orthogonal to s, while hoping that in fact surprisingly many of 
them will turn out to be orthogonal to 1, thereby 'fooling' Alice's hypothesis test. 
His chances of succeeding naturally depend on how much data Alice requires for her 
hypothesis test, and how she trades off the probability of making a Type I error 
(accepting data sampled classically from Y, for example) versus the probability of 
making a Type II error (rejecting data despite its having been sampled from X using 
TQP methods). As far as we are aware, neither random variable X nor Y seems to 
be particularly useful for actually learning s for sure. 

5.7 Summary 

We have made a thorough study of the simplest ('temporally unstructured') part 
of the Clifford-Diagonal hierarchy, by considering the mathematical structures that 
underpin the notion of an X-program or TQP oracle. We have looked at some 
different methods for conceptually implementing such a computational process and, 
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using ideas from graph state computing (§5.3), have found an implementation for 
XQP within logarithmic quantum-circuit depth (cf. [44]). 

Specialising to constant-action X-programs with 9 = vr/8, we have shown that mod- 
ulo post-selection (§3.2) they are as powerful as BQP computation, despite the fact 
that they contain no temporal structure, and can always be rewritten as 3-local X- 
programs (§5.4.3). We have proposed a family of easily described challenge problems 
(§5.5) that seems to capture well the complexity of this kind of problem, in context 
of a two-party protocol, exploiting a natural cryptographic analogy (§5.6). 

We have given several conjectures of an open-ended nature, to indicate directions 
for possible future work. We might also recommend the further study of matroid 
invariants through quantum techniques, or perhaps the invariants of weighted ma- 
troids, since they seem to be the natural objects of XQP computation as hitherto 
circumscribed. This would seem to be fertile ground for developing examples of 
things that only genuine quantum computers can achieve. 

Note that if it weren't for the correlation described in Theorem 5.6.3, then it would be 
possible to conceive of a mechanism whereby an XQ"P-capable device could compute 
an actual secret or witness to something (e.g. learn s), so that the computation 
wouldn't require two rounds of player interaction to achieve something non-trivial. 
Yet as it stands, it is an open problem to suggest tasks for this paradigm involving 
no communication nor multi-party concepts. 
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